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SUMMARY 


This  report  documents  the  selection  of  a  voice  communication  effectiveness  measure  for  implemen¬ 
tation  into  the  Armstrong  Aerospace  Medical  Research  Laboratory,  Biological  Acoustics  Branch 
(AAMRL/BBA),  Performance  and  Communications  Research  and  Technology  (PACRAT)  facility. 
Research  to  date  by  AAMRL/BBA  has  been  concerned  with  the  intelligibility  of  individual  words 
transmitted  and  received  in  the  presence  of  various  noise  and  interfering  modulations.  The 
research  has  proceeded  to  the  point  where  data  are  needed  on  the  effects  of  interfering  with  inter¬ 
active  voice  communications  where  relevant  information  is  passed  both  ways  over  a  communica¬ 
tion  link.  To  gather  these  data,  new  response  tasks  were  required  other  than  w'ord  intelligibility 
tasks  presently  used. 

Based  on  a  literature  review  of  speech  intelligibility,  human  performance/workload,  and  informa¬ 
tion  theory,  a  performance  task  was  selected  for  the  measure  of  voice  communication  effective¬ 
ness.  The  selected  performance  task  is  an  interactive  voice  communication  scenario  with  high 
verbal  demands.  The  communication  scenario  (primary  task)  utilizes  a  database  of  confusable 
words.  The  words  in  the  database  have  been  analyzed  for  mutual  information  and  entropy. 
Researchers  can  employ  the  scenario  as  an  independent  voice  communication  effectiveness  mea¬ 
sure,  or  it  can  be  used  in  conjunction  with  a  secondary  task.  The  secondary  task,  selected  for  use 
with  the  primary  task,  is  a  compensatory  tracking  task.  Dependent  variables  from  the  primary  and 
secondary  task  include  response  time,  errors,  number  of  requested  repeats,  timeouts,  and  root- 
mean-square  error. 

As  the  voice  communication  effectiveness  measure  is  implemented  on  the  PACRAT  facility, 
research  will  be  required  to  determine  its  validity  and  reliability.  Additional  recommendations 
concern  the  expansion  of  the  family  of  secondary  tasks. 
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Section  1 

INTRODUCTION 


1.1.  BACKGROUND 

Air  and  ground  crew  voice  communications  may  be  degraded  by  a  variety  of  environmental  and 
systems  factors.  Such  factors  may  include  electrical  or  acoustical  noise,  radio  interference, 
jamming,  communication  signal  processing,  as  well  as  various  other  forms  of  interference.  As 
communication  is  a  vital  part  of  the  flight  environment,  research  activities  attempting  to  identify  and 
quantify  potentially  degrading  elements  of  the  operational  environment  must  be  maintained. 
Analytical  studies  of  communication  system  performance  and  the  effect  of  environmental  influ¬ 
ences  on  those  systems  are  necessary.  Such  studies  are  possible  in  controlled  laboratory  environ¬ 
ments  where  special  instrumentation  can  be  used  to  create  the  elements  of  the  human  factors  and 
communication  system  networks  being  investigated. 

The  Harry  G.  Armstrong  Aerospace  Medical  Research  Laboratory/Biological  Acoustics  Branch 
(AAMRL/BBA)  has  been  engaged  in  a  long-term  research  program  sponsored  by  the  Air  Force 
Electronic  Warfare  Center  (AFEWC)  and  the  Joint  Electronic  Warfare  Center  (JEWC)  to  conduct 
such  investigations.  The  majority  of  this  research  has  investigated  the  effects  of  interference  on  the 
intelligibility  of  individual  words  transmitted  or  received  over  a  communication  channel.  Data  col¬ 
lection  instruments  have  included  the  Modified  Rhyme  Test  (MRT),  the  Diagnostic  Rhyme  Test 
(DRT),  and  the  Coordinate  Response  Measure  (CRM). 

1.2.  PURPOSE  OF  RESEARCH 

The  results  of  AAMRIVBBA’s  research  indicate  a  need  for  new  research  on  the  effects  of  interfer¬ 
ence  on  interactive  voice  communication  situations-in  other  words,  situations  where  information  is 
passed  two  ways  over  a  communication  channel.  Specifically,  the  amount  of  intelligibility  required 
to  perform  various  tasks  and  the  amount  of  time  required  to  perform  those  tasks  must  be  deter¬ 
mined.  To  investigate  this  area,  new  response  tasks  other  than  the  standard  intelligibility  tests  pre¬ 
viously  mentioned  are  needed.  AAMRL/BBA  requires  that  these  new  metrics  employ  standard 
measures  of  performance  such  as  total  time  needed  to  complete  a  task,  number  of  repeats 
requested,  and  number  of  errors  made  in  completing  a  task. 

To  develop  these  new  performance  metrics.  Systems  Research  Laboratories,  Inc.  (SRL),  was 
tasked  with  the  following:  (1)  a  survey  of  the  literature  concerning  performance  measures  of  voice 
communication  effectiveness  (VCE),  (2)  an  evaluation  of  various  performance  measures  of  VCE, 
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and  (3)  the  development  of  a  new  VCE  metric  to  be  implemented  in  AAMRL/BBA’s  test  facility. 
This  report  describes  SRL's  efforts  associated  with  these  tasks.  Sections  2  through  6  describe  the 
results  of  the  literature  survey.  Section  7  describes  the  performance  measures  considered  for  imple¬ 
mentation  in  the  VCE  test  facility.  Conclusions  and  recommendations  for  future  research  are  pro¬ 
vided  in  Section  8. 

1.3.  VCE  TEST  FACILITY 

The  Performance  and  Communications  Research  and  Technology  (PACRAT)  laboratory,  located 
in  the  basement  of  Building  441  at  Wright- Patterson  Air  Force  Base  (WPAFB),  is  the  facility  used 
by  AAMRL  for  VCE  research. 

The  PACRAT  facility  currently  consists  of  seven  individual  subject  communication  stations,  one 
experimental  control  station,  and  a  high  intensity  sound  system  capable  of  duplicating  operational 
acoustical  environments.  The  seven  subject  stations  are  housed  in  a  single  large  reverberation 
chamber.  The  control  station  and  the  sound  system  control  panel  are  located  in  a  room  adjacent  to 
the  reverberation  chamber. 

Each  of  the  seven  subject  stations  is  a  modified  aircraft  simulator  shell  with  communication,  dis¬ 
play,  input,  and  data  acquisition  capabilities  as  depicted  in  Figure  1 .  The  display  devices  include 
four  CRT  screens:  a  large  13-inch  screen,  and  three  smaller  9-inch  screens  (Figure  2).  All  CRT 
screens  have  color  and  graphics  capabilities.  Each  subject  station  has  two  communication 
addresses  to  which  it  will  respond.  One  address  is  common  to  all  stations.  By  using  this  address 
and  a  single  message,  all  stations  can  simultaneously  receive  the  same  information.  The  second 
address  system  is  specific  to  individual  stations.  Using  this  address,  different  messages  may  be 
simultaneously  sent  to  different  stations.  The  subject  stations  also  have  two  different  response  sys¬ 
tems.  The  first  system  consists  of  60  pushbuttons,  20  per  small  CRT  (five  on  each  side),  as  can 
be  seen  in  Figures  1  and  2.  The  second  response  system  is  an  F- 16  style  force  joystick  with  push- 
to-talk  and  electric  trim  switches.  This  system  may  be  used  to  respond  to  information  displayed  on 
the  large  CRT  screen.  Each  of  the  seven  subject  stations  is  also  compatible  with  standard  Air 
Force  .eadgear  and  respiration  systems. 

The  experimental  control  station  or  central  processing  unit  controls  each  of  the  individual  stations 
and  conducts  the  individual  testing  sessions.  The  control  unit  is  responsible  for  presenting  test 
material,  monitoring  participant  (both  sender  and  receiver)  activity,  and  recording,  storing,  and 
analyzing  subject  responses. 
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.  Photo  of  a  Subject  Test  Station  in  the  VCE  Test  Facility 


Figure  2.  Photo  of  the  Display  Devices  on  the  Subject  Test  Station  (The  large  1 3-inch 
CRT  screen  and  the  three  smaller  9-inch  "creens  all  have  color  and  graphic 
capabilities.) 


The  PACRAT  sound  system  is  comprised  of  a  noise  generator  and  a  spectrum  shaper  capable  of 
generating  almost  any  desired  noise  environment  within  the  human  audio  frequency  range.  This 
permits  accurate  reproduction  within  the  PACRAT  test  chamber  of  the  ambient  and  environmental 
noise  conditions  of  specific  operational  situations.  Speaker  banks  are  located  in  the  specially 
designed  and  constructed  reverberation  chamber.  This  chamber  is  constructed  to  maximize  the 
uniformity  of  the  level  of  noise  distributed  throughout  the  room. 

1.4.  LITERATURE  SEARCH 

SRL  utilized  existing  government  and  commercial  databases  to  provide  a  survey  of  the  literature 
pertaining  to  the  evaluation  of  voice  communication  effectiveness.  Based  upon  the  research  require¬ 
ments  of  AAMRL/BBA  and  test  facility  characteristics,  several  topic  categories  were  selected  for 
search.  These  categories  included:  speech  communication,  human  information  processing,  opera¬ 
tor  performance  measures,  tactical  scenarios,  and  communication  theory.  Table  1  displays  the  sub¬ 
categories  searched  within  each  topic  area.  Computer  searches  of  each  area  were  conducted  on 
DIALOG'S  Aerospace  and  Conference  Proceedings  Index  databases;  the  NASA  and  NTIS  techni¬ 
cal  repons  databases;  and  NERAC's  Engineering  Index,  Biological  Abstracts,  and  Inspect  data¬ 
bases  on  conference  papers,  journals,  and  news  reports.  Manual  searches  of  citation  and  reference 
indices  included  the  International  Technical  and  Scientific  Index,  Medicus  Index,  Psychological 
Abstracts  Index,  Science  Citation  Index,  and  Social  Science  Citation  Index.  Manual  searches  of 
the  holdings  of  the  Wright  Research  Development  Center  (WRDC)  Technical  Library,  the  Wright 
State  University  library,  and  the  University  of  Dayton  library  were  also  conducted.  The  literature 
search  yielded  several  hundred  sources  which  are  documented  in  the  bibliography  attached  to  this 
report.  Sections  2  through  6  describe  the  results  of  the  literature  search  in  detail. 

1.5.  VCE  PERFORMANCE  MEASURE  DEVELOPMENT 

Based  upon  the  results  of  the  literature  review  and  the  constraints/requirements  of  the  PACRAT 
test  facility,  three  potential  performance  measures  for  VCE  research  were  developed.  Two  of  these 
tasks  have  been  selected  for  implementation.  Section  7  gives  a  detailed  description  of  each  of  these 
tasks. 


5 


TABLE  1.  LITERATURE  SEARCH  SUBCATEGORIES 


Speech  Communication 

•  Verbal  Communication 

•  Noise  and  Speech 

•  Speech  Intelligibility  Measures 

•  Communication  Research 

•  Applied  Aviation  Noise  and  Communication  Research 

•  Synthetic  Speech  Technology 

Human  Information  Processing 

•  Perception 

•  Memory 

•  Learning 

•  Attention 

•  Language  Specialization 

•  Decision  Making 

•  Problem  Solving 

•  Auditory  Information  Processing 

•  Models  of  Processing 

Operator  Performance  Measures 

•  Performance  Measures  of  Behavior 

•  Performance  Measures  of  Psychological/Psychophysiological  Processes 

•  Operator  Workload 

•  Existing  Performance  Batteries 

Tactical  Scenarios 

•  General  Tactical  Profiles 

•  Communications  Oriented  Profiles 

Cammunicanoa.TMay 

•  Language 

•  Speech  Analysis 

•  Information 

•  Communication  Logic 

•  Mathematical  Theories  of  Communication 
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Section  2 

SPEECH  COMMUNICATION 

2.1.  INTRODUCTION 

Available  speech  communication  literature  describes  a  variety  of  topics.  These  topics  include  the 
description  of  the  physical  components  of  speech— for  example,  frequency  and  intensity  (Chapanis, 
1965;  Kryter,  1984,  1985);  the  linguistics  of  speech  (i.e.,  phonemes,  syllables,  words,  vocabular¬ 
ies,  and  messages);  the  methods  of  speech  production  and  articulation;  auditory  perception  and 
information  processing  (Carterette  and  Friedman.  1976;  Cole,  1980;  Hawkins  and  Presson,  1986; 
Jusczyk,  1986;  McCormick,  1976);  and  the  effectiveness  of  speech  communication  (Chapanis, 
1965;  Harris,  1979;  Kryter,  1984,  1985;  Van  Cott  and  Kinkade,  1972;  von  Gierke  and  Nixon, 
1985). 

Since  this  report  is  concerned  specifically  with  the  effectiveness  of  verbal  speech  communication, 
the  following  sections  focus  on  research  documenting  that  topic:  Section  2.2  defines  human  verbal 
communications.  Section  2.3  describes  the  various  methods  of  measuring  speech  intelligibility, 
Section  2.4  discusses  research  on  interactive  voice  communication,  Section  2.5  describes  synthetic 
speech,  and  Section  2.6  reviews  the  literature  on  aviation  noise  and  communication  research.  It 
should  be  noted  that  there  exists  very  little  published  research  on  interactive  communication.  How¬ 
ever,  many  components  of  interactive  communication  such  as  speech  intelligibility,  environmental 
effects,  and  operational  factors  are  well  documented. 

2.2.  VERBAL  COMMUNICATION 

Human  verbal  communication  exists  in  two  forms:  unidirectional  (i.e.,  noninteractive)  and  inter¬ 
active  communications.  Unidirectional  communication  describes  communication  in  which  the 
person  to  whom  the  message  is  addressed  is  a  passive  recipient.  The  receiver  of  this  form  of  com¬ 
munication  can  in  no  way  affect  the  communicator,  the  communication  process,  or  the  content  of 
the  message  that  is  received.  Examples  of  unidirectional  communication  include  speeches,  lec¬ 
tures,  and  television  broadcasts.  Interactive  communication  describes  situations  in  which  more 
than  one  of  the  participants  are  both  senders  and  receivers  of  information.  Interactive  communica¬ 
tion  is  not  passive.  Participants  can  affect  the  other  communicators,  the  process  itself,  and  the 
content  of  the  message.  Examples  of  interactive  communication  include  two-way  radio  transmis¬ 
sions,  arguments,  telephone  conversations,  and  human-computer  dialogue. 
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Verbal  communication,  both  unidirectional  and  interactive,  is  crucial  to  the  successful  performance 
of  innumerable  tasks  in  most  aerospace  operations.  Measurement  techniques  for  determining  the 
adequacy  of  verbal  communication  then  become  important.  Measurement  techniques  described  in 
the  literature  include  both  "voice  communications  effectiveness"  and  "speech  intelligibility"  mea¬ 
sures.  These  two  terms  are  often  used  interchangeably.  "Voice  communications  effectiveness" 
can  be  defined  as  the  efficacy  of  verbal  communication  while  "speech  intelligibility"  may  be 
defined  as  the  understanding  of  spoken  words.  Despite  the  similarity  in  these  definitions,  there  is 
an  important  difference  in  their  meaning.  The  term  "voice  communications  effectiveness"  not  only 
includes  the  intelligibility  or  understandability  of  speech,  but  also  implies  that  there  is  a  response 
(or  some  performance)  made  by  the  receiver  based  upon  the  intelligibility  of  the  message. 

Both  the  intelligibility  of  speech  and  the  effectiveness  of  voice  communications  can  be  influenced 
by  a  variety  of  environmental,  human,  message,  and  system  factors.  Environmental  factors 
include  noise,  vibration,  acceleration,  stressors,  and  task  requirements.  Human  influences  include 
speech  habits,  dialects,  word  usage,  language  familiarity,  hearing  loss,  communication  experience, 
motivation,  workload,  and  emotional  state.  Elements  of  the  message  which  influence  voice  com¬ 
munications  effectiveness  include  vocabulary  size,  vocabulary  familiarity  and  frequency,  message 
redundancy,  message  presentation,  and  context.  Equipment  factors  include  interference  with  the 
clarity,  volume,  etc.,  of  the  speech  signal,  and  interference  with  the  subject's  auditory  capabilities. 

2.3.  SPEECH  INTELLIGIBILITY  MEASURES 

A  variety  of  standardized  methodologies  are  described  in  the  literature  for  measuring  the  perfor¬ 
mance  of  voice  communication  systems.  Methodologies  exist  for  measuring  both  the  entire  com¬ 
munication  system  and  its  various  individual  elements.  Both  subjective  measures,  in  which  the 
percentage  of  a  given  speech  sample  that  is  correctly  perceived  by  a  receiver,  and  physical  mea¬ 
sures  of  the  system  and  the  environment  exist.  Relating  the  subjective  measures  to  the  physical 
measures  allows  the  effectiveness  of  speech  communication  to  be  assessed. 

2.3.1.  Physical  Predictors  of  Intelligibility 

Physical  measures  of  the  system  and  environment  used  to  predict  speech  intelligibility  include  the 
A-weighted  sound  level  [dB(A)],  speech  interference  level  (SEL),  noise  criteria  (NC),  and  the 
articulation  index  (AI).  SIL,  dB(A),  and  NC  measures  will  not  be  discussed  here  as  they  do  not 
provide  very  comprehensive  assessments  of  intelligibility  (see  Webster,  1979  for  descriptions  of 
these  measures). 
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2.3. 1.1. 


The  Articulation  Index 


The  AI  is  perhaps  the  most  widely  used  of  the  physical  predictors  of  speech  intelligibility.  Calcula¬ 
tion  of  the  AI  is  based  upon  determination  of  a  weighted  signal-to-noise  ratio  from  the  level  of  the 
speech  signal  and  the  noise  in  the  environment.  The  difference  between  the  level  of  the  speech  and 
the  level  of  the  noise  is  measured  in  20  contiguous  bands  of  frequencies.  These  frequency  bands 
contribute  equally  to  speech  intelligibility  when  all  are  at  optimal  gain.  The  average  difference 
between  signal  and  noise  (across  all  bands)  is  then  normalized  to  yield  a  value  between  0  and  1.0. 

A  value  of  0  indicates  the  listener  will  rarely  be  able  to  understand  speech  in  the  given  environ¬ 
ment,  while  a  value  of  1.0  indicates  potentially  perfect  perception  by  the  listener.  The  American 
National  Standard  Methods  for  the  Calculation  of  the  Articulation  Index  (ANSI  S3.5-1969) 
describes  detailed  instructions  for  calculation  of  the  AI. 

2.3. 1.2.  The  Speech  Transmission  Index 

Steeneken  and  Houtgast  (1980,  1981;  Houtgast  and  Steeneken,  1981)  have  developed  a  Speech- 
Transmission  Index  (STI)  which  is  an  extension  of  the  AI.  The  STI  is  based  on  the  Modulation 
Transfer  Function  (MTF)  of  a  transmission  channel,  and  is  used  as  a  physical  method  for  mea¬ 
suring  speech-transmission  quality.  A  study  by  Steeneken  (1987)  compared  the  STI  with  the  DRT 
and  a  Consonant- Vowel-Consonant  (CVC)  word  tests  to  obtain  speech  intelligibility  scores  for 
diagnostic  information  related  to  the  type  of  deterioration  to  which  the  speech  signal  was  subjected. 
He  found  that  the  STI  provided  better  diagnostic  information  for  the  evaluation  and  classification  of 
speech  channels  than  either  the  DRT  or  CVC  word  tests. 

2.3.2.  Subjective  Measures  of  Intelligibility 

Measures  of  intelligibility  which  are  based  upon  psychoacoustic  measurements  of  the  communica¬ 
tion  system  or  the  environment  can  be  divided  into  four  main  classes:  (a)  nonsense  syllables,  (b) 
spondaic  words,  (c)  sentences,  and  (d)  monosyllabic  words  (Chapanis,  1959).  Tests  using  mono¬ 
syllabic  words  are  further  subdivided  into  phonetically  balanced  (PB)  word  tests  (ANSI,  1960) 
and  rhyme  tests  like  the  Fairbanks  Rhyme  Test  (Fairbanks,  1958),  the  Modified  Rhyme  Test 
(MRT)  (House,  Williams,  Hecker,  and  Kryter,  1965),  and  the  Diagnostic  Rhyme  Test  (DRT) 
(Voiers,  1968,  1977).  The  DRT  has  since  been  computerized  by  the  U.S.  Army  to  test  armored 
vehicle  intercommunications  systems  (Mayer,  1985). 
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2.3.2. 1. 


Nonsense  Syllables 


Nonsense  syllables  (e.g.,  monz,  nan,  fook)  have  been  successfully  used  to  determine  the  effective¬ 
ness  of  specific  communication  devices  in  transmitting  particular  speech  sounds  (Beranek,  1949; 
Chapanis,  1959).  Figure  3  depicts  the  relativity  of  nonsense  syllables  to  words  and  sentences  for 
evaluating  such  transmission  equipment.  Unfortunately,  using  nonsense  syllables  requires  exten¬ 
sive  training  for  both  talkers  and  listeners.  Talkers  have  to  learn  to  correcdy  pronounce  the  funda¬ 
mental  speech  sounds  that  comprise  the  test,  and  listeners  must  learn  to  recognize  these  sounds  and 
be  able  to  record  the  associated  phonetic  symbols. 


ARTICULATION  INDEX 


Figure  3.  A  Comparison  of  Speech  Intelligibility  Measures  with  the  Articulation  Index 
(ANSI  S3. 5-1969) 
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2.3. 2.2. 


Spondiac  Words 


Spondiac  words,  or  spondees,  are  primarily  used  to  determine  speech  level  settings  on  equipment 
to  achieve  the  threshold  of  detection  by  listeners.  The  spondees  themselves  are  two-syllable  words 
which  are  spoken  with  equal  stress  on  each  syllable  (e.g.,  airplane,  woodchuck).  These  words 
reach  the  listener's  threshold  of  hearing  within  a  very  narrow  intensity  range.  This  allows  for  high 
precision  in  the  experimenter's  measurements. 

2. 3. 2. 3.  Sentences 

Sentence  tests,  like  spondiac  word  tests,  are  of  rather  limited  use  for  testing  communication  equip¬ 
ment  Because  sentences  have  certain  inherent  characteristics  (meaning,  context,  rhythm)  which 
words  and  syllables  do  not  have,  sentence  tests  typically  yield  very  high  intelligibility  scores. 

Since  these  scores  are  usually  high,  communication  systems  must  differ  greatly  to  achieve  a  sig¬ 
nificant  difference  in  scores  (Beianek,  1949).  Sentence  tests  are,  however,  useful  for  testing  the 
maintenance  of  loudness  levels,  and  for  evaluating  the  rate,  inflection,  and  stress  patterns  of 
talkers’  speech.  The  sentence  lists  used  for  testing  are  normally  one  of  two  kinds.  They  are  either 
questions  requiring  an  answer  from  the  listener  (e.g..  What  letter  comes  after  ”Q"?),  or  statements 
which  must  be  recorded  by  the  listener  (e.g.,  Take  the  cards  from  the  deck,  vou  bum).  For  ques¬ 
tions,  wrong  answers  are  scored  as  errors.  For  statements,  only  five  key  words  (predetermined 
and  underlined  for  the  talker)  are  checked  for  correctness.  Cliches,  proverbs,  popular  phrases,  and 
very  frequently  used  words  are  not  used  in  the  sentences  (Beranek,  1949;  Chapanis,  1959;  Egan, 
1948;  Kalikow  and  Stevens,  1977;  Kryter,  1972). 

2. 3. 2. 4.  Monosyllabic  Words 

The  most  commonly  used  test  materials  for  determining  speech  intelligibility  are  monosyllabic  or 
one-syllable  words.  As  stated  previously,  monosyllabic  words  are  used  in  the  PB  word  tests  and 
rhyme  tests. 

2. 3. 2.4. 1 .  Phonetically  Balanced  Words 

Typical  PB  word  lists  consist  of  50  words  each.  A  set  of  20  of  these  lists  are  provided  by  the 
USA  Standard  Method  for  Measurement  of  Monosyllabic  Word  Intelligiblitv  (ANSI  S3.- 1960). 
The  frequency  of  occurrence  of  the  types  of  speech  sounds  (e.g.,  fricatives,  glides,  nasals)  are 
approximately  the  same  as  in  normal  everyday  speech;  hence,  they  are  deemed  "phonetically 
balanced."  The  words  in  each  list  are  also  approximately  equal  in  difficulty;  thus,  if  an  average 
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intelligibility  score  of  50  percent  is  achieved  by  a  test  group,  then  very  few  of  the  words  will  be 
extremely  easy  or  difficult  to  understand  (Beranek,  1949).  The  PB  word  lists  of  the  ANSI  S3.2- 
1960  should  each  be  randomly  reordered  before  each  use.  For  best  results  in  testing,  all  20  lists 
(1000  different  words)  should  be  used  (see  Figure  3). 

2. 3. 2. 4. 2.  Modified  Rhyme  Test 

The  MRT  is  the  most  commonly  used  monosyllabic  word  test.  The  MRT  normally  consists  of  50 
numbered  sets  of  six  words  on  an  answer  sheet  for  the  listener,  and  50  numbered  single  words, 
one  word  taken  from  each  set  of  the  six  words  on  the  listener's  list,  for  the  talker.  The  talker 
announces  the  key  word  in  a  carrier  sentence  like  "Number  (of  the  kev  wordl.  you  will  mark  the 
(key  word)  please"  (Kryter,  1972),  or  a  carrier  phrase  like  "Number  (of  the  kev  word-)  is  (kev 
wordi"  (House  et  al.,  1965).  Of  the  set  of  50  words,  25  sets  are  such  that  the  final  consonantal 
element  is  varied,  and  25  sets  vary  the  initial  consonantal  element.  An  example  of  each  type  of  set 
is  as  follows: 


1.  bat  ?  bad  back  bass  ban  bath 

2.  look  took  shook  cook  hook  book 

The  listener's  answer  sheet  is  scored  by  counting  the  number  of  words  correctly  marked  for  the 
test.  This  amount  is  then  corrected  for  chance  guessing  by  using  the  following  formula  (Kryter, 
1972): 

%  correct  =  [No.  right  -  (No.  wrong/5)  x  2] 

Like  the  PB  word  lists,  each  MRT  list  should  be  randomly  generated  between  sets  and  within  each 
set  for  each  test.  The  MRT  words  are  not  phonetically  balanced  to  reflect  everyday  usage,  but  the 
MRT  is  still  considered  useful  and  efficient.  This  is  because  it  requires  perception  of  consonantal 
sounds.  These  sounds  are  difficult  to  transmit  successfully  and,  therefore,  important  to  intelligi¬ 
bility  (Kryter,  1972). 

2. 3. 2. 4. 3.  Diagnostic  Rhyme  Test 

The  DRT  word  lists  were  developed  to  test  consonant  discriminability  with  six  features  of  a 
phonemic  taxonomy:  (1)  voicing,  (2)  nasality,  (3)  sustention,  (4)  sibilation,  (5)  graveness,  and 
(6)  compactness  (Voiers,  1968, 1977).  There  are  96  rhyming  word  pairs  used  in  the  DRT  that 
differ  phonemically  on  their  initial  consonants.  An  example  of  the  word  pairs  for  each  feature  are 
as  follows  (Voiers,  1977): 
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Feature 

Example  A 

Example  B 

Voicing 

Dint-Tint 

Zoo-Sue 

Nasality 

Nip-Dip 

Moot-Boot 

Sustention 

Thick-Tick 

Foo-Pooh 

Sibilation 

Sing-Thing 

Juice- Goose 

Graveness 

Fin-Thin 

Moon -Noon 

Compactness 

Gill-Dill 

Coop-Poop 

Together  the  six  phonemic  perceptual  features  provide  an  overall  gross  measure  of  speech  intelligi¬ 
bility;  although,  if  necessary,  they  can  be  measured  separately.  The  word  pairs  are  usually  pre¬ 
sented  so  that  each  feature  appears  twice  to  each  listener  for  each  trial.  The  listener  is  given  a 
pencil  and  a  list  of  word  pairs  to  be  announced,  and  then  marks  out  the  one  word  of  each  pair  that 
he/she  perceives  to  have  been  spoken.  The  overall  speech  intelligibility  score  is  adjusted  for 
guessing  with  the  follow  correction  formula  (Voiers,  1977): 

q  -  100  (R  '  w> 

T 

where  S  is  the  adjusted  percent  of  correct  answers,  R  is  the  number  of  right  answers  marked,  W  is 
the  number  of  wrong  answers  marked,  and  T  is  the  total  possible  number  of  correct  answers. 

2.4.  INTERACTIVE  VOICE  COMMUNICATIONS  RESEARCH 

The  major  focus  of  research  on  interactive  voice  communications  has  been  on  human-to-computer 
interaction  in  voice  interactive  systems.  The  small  amount  of  work  that  has  been  done  on  human- 
to-human  interactive  voice  communications  has  been  in  conjunction  with  basic  research  on  human- 
computer  interactions.  Chapanis  (1971)  points  out  that  before  a  truly  interactive  computer  system 
can  be  developed,  it  is  necessary  to  better  understand  the  interaction  between  human  beings 
engaged  in  communication. 

Chapanis,  Ochsman,  Parrish,  and  Weeks  (1972)  described  experiments  to  study  interactive 
communication  of  two-person  teams  during  cooperative  problem-solving.  They  studied  these 
effects  using  four  communication  modes:  (a)  typewriting,  (b)  handwriting,  (c)  voice,  and  (d) 
"communication-rich."  The  "communication-rich"  mode  entailed  two  subjects  communicating, 
face-to-face,  in  any  way  they  wanted.  In  the  other  modes,  the  subjects  were  separated  by 
partitions  which  included  holes  to  pass  notes  written  during  tests  of  the  handwriting  mode  of  com¬ 
munication.  The  typewriting  mode  (using  teletypewriters)  was  further  split  into  two  groups  with 
one  group  composed  of  inexperienced  typists  and  the  other  group  composed  of  experienced  typists 
(i.e.,  typists  having  completed  at  least  a  1-year  course  of  high  school  typing). 
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For  each  two-person  team,  one  subject  was  designated  as  the  information  source  or  "source,"  and 
the  other  subject  was  deemed  the  information  seeker  or  "seeker."  The  source  subject  was  to  be  con¬ 
sidered  as  a  hypothetically  ideal  computer,  and  the  seeker  subject  as  the  user  of  that  computer. 

Two  problems  were  used  for  the  tests:  (1)  an  equipment  assembly  problem,  and  (2)  a  geographic 
orientation  problem.  In  the  equipment  assembly  problem,  the  seeker  was  to  assemble  a  trash  can 
carrier.  In  the  geographic  orientation  problem,  the  seeker  was  to  find  the  office  or  residence 
address  of  a  physician  closest  to  a  hypothetical  home  address.  Three  dependent  variables  were 
measured:  (a)  time  to  arrive  at  a  solution,  (b)  behavioral  measures  of  activity,  and  (c)  linguistic 
measures. 

The  results  indicated  that  the  two  voice  communication  modes  were  significantly  superior  for 
interactive  communication.  Overall,  subjects  in  the  "communication-rich”  mode  condition  took  the 
shortest  time  to  arrive  at  a  solution,  with  a  mean  time  of  just  less  than  30  minutes.  Subjects  in  the 
voice  mode  condition  took  just  under  35  minutes  to  arrive  at  a  solution  (this  was  not  significantly 
different  from  the  communication-rich  mode).  Subjects  using  the  other  modes  took  nearly  twice  as 
long  to  solve  their  problems.  The  handwriting  mode  was  superior  to  the  typewriting  mode,  and 
experienced  typists  were  slightly  faster  problem  solvers  than  inexperienced  typists. 

Results  of  the  behavioral  measures  suggest  that  the  source  and  seeker  subjects  using  the  voice-only 
mode  spent  almost  equal  amounts  of  their  time  searching  for  and  sending  data  to  solve  the  problem; 
however,  the  source  subjects  did  spend  more  time  in  sending  data  for  the  equipment  assembly 
problem. 

The  linguistic  measures  (Chapanis,  Parrish,  Ochsman,  and  Weeks,  1977)  showed  that  many  more 
messages,  sentences,  and  words  were  used  in  the  voice  and  "communication-rich"  conditions  than 
in  the  other  conditions.  On  the  average,  the  subjects  using  the  oral  modes  talked  about  183  words 
per  minute;  however,  the  source  subjects  used  longer  messages  (1 1.8  words  per  message  versus 
7.9  words)  and  longer  sentences  (6.7  words  per  sentence  versus  5.0  words)  than  did  the  seeker 
subjects. 

Chapanis  and  Overbey  (1974)  did  another  similar  experiment  using  32  college  students  for  sub¬ 
jects.  The  subjects  were  again  assigned  jobs  as  either  seekers  or  sources  in  one  of  the  two  adjacent 
rooms  used  during  the  previous  experiment  The  subjects  used  a  speaker-microphone  system  and/ 
or  a  teletypewriter  system  for  communication.  There  were  four  different  configurations  of  com¬ 
munication  modes  used  during  the  tests.  In  a  seeker-source  relation,  the  four  modes  were: 
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(a)  voice-voice  (V-V),  (b)  voice-typewriter  (V-T),  (c)  typewriter-voice  (T-V),  and  (d)  typewriter- 
typewriter  (T-T). 

Four  problems  were  used  in  this  experiment.  Two  were  the  same  as  in  the  previously  mentioned 
experiment  The  other  two  problems  consisted  of  either  an  information  retrieval  problem  or  an 
object  identification  problem.  In  the  information  retrieval  task,  the  seeker  was  to  find  five  citations 
of  different  newspaper  articles  relevant  to  a  given  topic  from  a  portfolio  of  newspapers  given  to  the 
source.  In  the  object  identification  task,  the  seeker  was  to  identify  and  obtain  a  replacement  for  a 
small  pilot  light  socket  from  a  large  number  of  different  sockets  kept  by  the  source. 

The  results  of  these  experiments  confirmed  the  earlier  findings  that  a  voice  mode  of  communication 
was  significantly  better  for  problem  solving  than  a  typewriting  mode.  In  the  seeker-source  relation¬ 
ship,  the  rank  order  of  the  communication  modes  were  as  follows:  (1)  V-V,  (2)  V-T,  (3)  T-V,  and 
(4)  T-T.  The  average  message  lengths  used  for  solving  the  problems  were  about  five  times  faster 
(3.0  messages  per  minute)  for  the  V-V  mode  than  for  the  T-T  mode  (0.6  messages  per  minute). 

Chapanis  (1975,  1976)  again  tested  interactive  communication  modes.  In  this  experiment,  ten  dif¬ 
ferent  communication  modes  (five  with  voice  and  five  without  voice  communications)  were  tested. 
Two  generalizations  resulted  from  this  experiment:  (1)  that  communications  problems  are  solved 
significantly  faster  when  verbal  communication  is  allowed,  and  (2)  that  problems  are  solved 
equally  well  in  voice-only  and  face-to-face  modes. 

2.5.  SYNTHETIC  SPEECH  AND  INTELLIGIBILITY 

A  number  of  experiments  have  investigated  the  effects  of  synthetic  speech  systems  on  intelligi¬ 
bility.  Porubcansky  (1985)  describes  the  increasing  interest  in  automated  speech  technology  for 
use  in  Air  Force  aircraft.  Of  the  two  types  of  speech  synthesis  production  (i.e.,  phonemic  synthe¬ 
sizer  and  encoded  speech  synthesis  system),  the  encoded  speech  synthesis  systems  produce  the 
most  intelligible  speech.  Thus,  the  Air  Force  has  focused  its  research  programs  on  the  develop¬ 
ment  of  the  best  possible  speech  synthesis  system  using  speech  waveform  encoding  techniques, 
such  as  linear  predictive  coding  (LPC). 

LPC  and  other  waveform  encoding  techniques  have  been  investigated  for  voice  communication 
effectiveness  by  McKinley  and  Moore  (1986).  McKinley  and  Moore  measured  the  speech  intelligi¬ 
bility  in  simulated  aircraft  cockpit  noise  with  ten  subjects  by  using  the  MRT.  They  found  that 
different  audio  bandwidths  and  bit  error  rates  significantly  effected  the  speech  intelligibility  for  the 
encoding  techniques  used. 
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The  DRT  is  still  "widely  used  to  evaluate  digital  voice  systems"  (Schmidt-Nielsen,  1987).  The 
problem  that  many  researchers  have  with  using  the  DRT  is  that  there  is  no  reference  frame  for  inter¬ 
preting  DRT  scores  for  every  day  or  operational  performance  measures.  Schmidt-Nielsen  com¬ 
pared  DRT  scores  with  intelligibility  levels  of  the  International  Civil  Aviation  Organization  (ICAO) 
spelling  alphabet  (e.g..  Alpha,  Bravo,  Charlie,  etc.)  using  an  LPC  algorithm.  Her  results  showed 
that  ICAO  intelligibility  remained  high  until  the  DRT  scores  fell  below  75  percent,  at  which  point 
the  ICAO  intelligibility  dropped  off  quickly  until  the  DRT  scores  reached  50  percent.  At  the  50  per¬ 
cent  DRT  level,  the  ICAO  intelligibility  level  was  about  half. 

Slowiaczek  and  Nusbaum  (1985)  examined  the  effects  of  speech  rate  and  pitch  contour  on  the 
perception  of  speech.  The  results  indicated  that  speech  rate  influenced  intelligibility  more  than 
pitch  contour.  Greene,  Logan,  and  Pisoni  (1986)  used  the  MRT  to  evaluate  intelligibility  of  eight 
off-the-shelf  text-to-speech  systems  as  compared  to  natural  speech.  The  results  showed  only  one 
of  the  systems,  DECtalk-Paul,  was  comparable  to  natural  speech. 

2.6.  AVIATION  NOISE  AND  COMMUNICATIONS  RESEARCH 

2.6.1.  General  Aviation  and  Communications  Noise  Research 

The  various  studies  utilizing  aircraft  noise  in  conjunction  with  issues  of  speech  communication  fall 
into  two  groups- studies  done  with  a  point  of  reference  from  outside  an  aircraft  (Amoult  and 
Voorhees,  1980;  Frohlich,  1981;  Kryterand  Williams,  1965;  Pollack,  1958;  Webster,  1965; 
Williams,  Mosko,  and  Greene,  1976),  and  those  done  from  inside  the  aircraft/cockpit  (Amoult, 
Voorhees,  and  Gilfillan,  1986;  Lacey,  1973;  Pratt,  1981;  Wheeler  and  Halliday,  1981;  Williams. 
Forstall,  and  Greene  1971).  The  studies  of  interest  in  this  report  are  those  done  with  a  point  of 
reference  inside  the  aircraft. 

Williams,  Forstall,  and  Greene  (1971)  used  an  in-flight  manikin  to  evaluate  the  communications 
effectiveness  of  three  different  helmets.  Speech  intelligibility  tests  (viz.,  MRT)  were  transmitted  to 
six  subjects  along  with  the  manikin  in  an  airborne  C-45  aircraft.  Later,  on  the  ground,  the  flight 
was  simulated  by  reproducing  the  aircraft  cabin  noise  in  the  laboratory.  The  same  six  subjects  and 
two  groups  of  ten  listeners  were  retested  by  replaying  the  recordings  made  with  the  manikin.  The 
results  showed  very  little  difference  in  scores  for  the  two  test  situations.  Williams  et  ai.  concluded 
that  in-flight  manikins  could  indeed  be  used  to  test  the  communication  effectiveness  of  flight  hel¬ 
mets.  The  data  also  seem  to  suggest  that  intelligibility  tests  can  be  performed  via  laboratory  simula¬ 
tions  with  good  results,  and  at  less  expense  than  actual  in-flight  measures. 
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To  evaluate  the  effectiveness  of  a  voice  display  mode  for  jet  aircraft  in  an  air  combat  maneuver 
environment,  Lacey  (1973)  used  speech  intelligibility  tests  (i.e.,  MRT  and  operational  word  lists) 
in  conjunction  with  both  simulated  aircraft  noise  and  background  speech  interference.  Results 
showed  that  the  pilots  who  participated  as  subjects  understood  approximately  65  percent  of  the 
MRT  words  and  89  percent  of  the  operational  words.  Based  upon  these  results,  Lacey  suggests 
that  a  voice  advisory  system  may  be  feasible  during  air  combat  maneuvers. 

Wheeler  and  Halliday  (1981)  describe  a  laboratory  evaluation  of  an  active  noise  reduction  (ANR) 
system  for  flight  helmets.  Subjects'  performance  on  a  speech  intelligibility  test  (i.e.,  an  MRT)  was 
recorded  in  various  conditions  of  background  aircraft  noise.  One  half  of  the  subjects  performed 
the  intelligibility  test  while  wearing  the  ANR  helmet.  Depending  on  the  specific  noise  condition, 
the  ANR  system  appeared  to  reduce  noise  15  to  20  dB(A). 

Pratt  (1981)  used  simulated  aircraft  noise  (viz.,  helicopter)  and  tested  subjects  using  the  MRT  and 
the  Clarke's  Vowel  Test  (CVT)  to  measure  the  effectiveness  of  an  automated  multiple  choice  intelli¬ 
gibility  testing  system.  The  CVT  uses  single  syllable  words  (where  only  the  vowels  change 
between  words).  As  in  the  MRT,  subjects  had  to  choose  each  keyword  from  a  group  of  words, 
but  only  five  words  instead  of  six.  The  results  showed  that  there  was  no  significant  difference 
between  the  automatic  and  the  manud  tests.  The  CVT  scores  did  not  fare  so  well,  whereas  there 
was  a  significant  difference  between  the  manual  and  automatic  tests. 

Amoult,  Voorhees,  and  Gilfillan  (1986)  investigated  the  effects  of  annoyance  on  speech  intelligibil¬ 
ity  in  various  backgrounds  of  simulated  helicopter  cabin  noise.  The  test  materials  used  were  com¬ 
plete  sentences.  The  sentence  tests  were  developed  following  the  recommendations  of  Hudgins, 
Hawkins,  Karlin,  and  Stevens  (as  cited  in  Amoult  et  al.,  1986),  with  the  exception  that  all 
sentences  were  to  be  answered  as  either  true  or  false.  Altogether,  160  sentences  were  made. 

These  were  presented  in  groups  of  ten  for  16  sets,  each  set  having  five  true  and  five  false  state¬ 
ments,  and  randomly  arranged  in  each  set.  The  sentences  were  prerecorded  by  a  male  speaker. 

The  simulated  helicopter  cabin  noise  was  composed  of  two  components:  (1)  a  pink  noise  (PN) 
broadband  signal,  and  (2)  one  of  three  pure  tones  (PT)  at  650,  1900,  or  5000  Hz.  These  compo¬ 
nents  were  then  generated,  in  all  combinations,  at  four  sound  levels  [i.e.,  0,  60,  70,  and 
80  dB(A)].  The  sentences  were  presented  at  either  50  or  55  dB(A).  The  results  indicated  that  both 
noise  sources,  PN  and  PT,  and  their  interactions  were  significant  (p  <  .001)  regarding  intelligi¬ 
bility  and  annoyance.  The  PN  component  had  relatively  more  effect  on  intelligibility  loss,  and  the 
PT  components  caused  more  annoyance. 
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Interactive  communication  degradation  from  audio  jamming  is  an  important  concern  for  the  aero¬ 
space  community.  A  classic  test  called  the  Michigan  Map  Test  was  developed  in  the  1950s  (cited 
in  Bennighof,  Farris,  Lauderdale,  Richard,  and  Wild,  1978).  The  map  is  made  up  of  a  criss-cross 
pattern  producing  a  field  of  diamond  shapes  with  each  comer  (representing  a  town)  designated  by  a 
letter  from  a  phonetic  alphabet.  A  talker  attempts  to  guide  a  listener  through  the  diamond  grid  from 
one  of  972  predesignated  possible  routes  available  for  use.  Each  route  represents  9.925  bits  of 
information.  The  route  is  transmitted  when  the  jamming  begins  and  the  time  is  measured  for  the 
receiver  to  travel  to  six  towns.  The  measured  time  is  the  jamming  effectiveness  measure  for  each 
jammer/signal  ratio.  The  guideline  of  performance  goes  from  a  base  reference  point  of  2  seconds 
with  no  jamming  to  20  seconds  (maximum)  with  jamming. 

2.6.2.  AAMRL/BBA  Research 

The  Communication  Evaluation  Facility,  now  known  as  the  Voice  Communication  Research  and 
Evaluation  System  (VOCRES),  located  in  the  Biodynamics  and  Bioengineering  Division  of 
AAMRJL  (McKinley,  1980,  1981),  has  been  used  extensively  for  testing  the  effectiveness  of  com¬ 
munication  equipment  in  various  noise  and  jamming  environments.  Using  this  facility,  Moore, 
McKinley,  Mortimer,  and  Nixon  (1978)  evaluated  the  word  intelligibility  of  two  modulator/ 
demodulator  (modem)  systems  of  a  spread  spectrum  communication  system  in  the  presence  of 
simulated  F-  15A  cockpit  noise.  Moore  et  al.  used  various  jamming  conditions  with  cockpit  noise 
while  administering  the  MRT.  The  results  showed  that  increased  jamming  with  cockpit  noise  did 
degrade  the  MRT  scores,  and  that  the  advantages  of  either  modem  were  case  specific  depending  on 
the  jammer-to-signal  power  ratio. 

Additional  jamming  research  has  been  conducted  in  the  VOCRES  facility.  Moore  (1981)  examined 
the  comparative  effectiveness  of  five  different  types  of  jammers.  Two  types  of  tests  materials  were 
used  to  measure  intelligibility:  (1)  the  MRT,  and  (2)  a  more  operationally  realistic  word  test  devel¬ 
oped  by  Ascher  et  al.  (cited  in  Moore,  1981).  The  results  of  both  intelligibility  tests  showed  that 
the  dual  FM  swept  tones  jamming  signal  was  the  most  effective.  Also,  during  this  study,  reported 
later  by  Nixon,  McKinley,  and  Moore  (1982),  listeners  were  evaluated  for  training  effects  on 
increased  intelligibility  of  jammed  words.  The  results  indicated  that  training  did  improve  the 
listeners'  ability  to  recognize  jammed  words. 

Research  on  the  effect  of  aviation  cockpit  noise  on  word  intelligibility  without  jamming  has 
included  the  evaluation  of  various  radio  systems  (Moore,  McKinley,  and  Mortimer,  1979),  and 
in-flight  headsets  (Prohaska  and  Nixon,  1984).  The  ARC-34  and  ARC- 164  transceiver  radio 
systems  were  tested  using  the  MRT  and  three  levels  (i.e.,  95, 105,  and  115  dB)  of  cockpit  noise  in 
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the  VOCRES.  The  ARC-164  radio  was  found  to  perform  better  in  all  three  noise  levels.  Various 
nonstandard  in-flight  headsets  were  compared  against  each  other  and  the  standard  in-flight  headset 
(viz.,  H-157),  again  by  obtaining  MRT  scores  in  one-third  octave  bands  of  noise.  Results  favored 
the  nonstandard  headsets,  but  also  suggested  that  the  standard  headset  provided  more  attenuation, 
especially  at  the  higher  frequencies  tested. 

In  response  to  reports  from  aircrew  members  that  positive  pressure  breathing  affects  voice  com¬ 
munication,  Nixon  (1984)  studied  positive  pressure  breathing  under  various  conditions  of  simu¬ 
lated  aircraft  noise  in  the  VOCRES.  Using  the  MRT,  Nixon  found  that  speech  intelligibility  was 
not  significantly  degraded  until  the  simulated  cockpit  noise  reached  1 15  dB.  There  was,  however, 
a  trend  across  the  other  noise  conditions  suggesting  an  inverse  relation  between  breathing  pressure 
and  intelligibility. 

In  addition  to  these  studies,  Moore  and  McKinley  (1986)  presented  a  review  that  described  a  num¬ 
ber  of  speech  related  studies  being  conducted  by  AAMRL/BBA.  In  thrir  review,  and  pertinent  to 
the  last  study  on  pressure  breathing,  is  a  brief  discussion  concerning  the  effects  of  acoustic- 
phonetic  changes  from  acceleration.  This  is  relevant  because,  as  Nixon  (1984)  related,  communica¬ 
tion  in  an  actual  aircraft  is  done  under  varying  degrees  of  G-force  along  with  the  pressure  breathing 
experienced  by  aircrew  members.  Moore  and  McKinley  also  presented  other  AAMRL/BBA  data 
related  to  modem  issues  of  speech  coding,  and  mentioned  some  of  the  experiments  being  done  on 
synthetic  speech  and  speech  recognition  devices. 
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Section  3 

HUMAN  INFORMATION  PROCESSING 


The  information  processing  requirements  of  piloting  tasks  are  great.  Modem  aircraft,  particularly 
jet  fighters,  allow  pilots  access  to  a  great  amount  of  information,  yet  often  give  them  little  time  to 
process  it.  This  information  may  be  displayed  and  acted  upon  in  a  number  of  ways.  In  aircraft, 
visual  and  auditory  displays  of  information  are  most  common.  These  displays  may  require  spatial 
or  manual  transformations  of  information  and  manual  and/or  vocal  responses.  The  result  is  the 
requirement  to  perform  complex,  difficult,  highly  cognitive  tasks  in  limited  amounts  of  time. 
Although  a  thorough  review  of  the  human  information  processing  system  is  beyond  the  scope  of 
this  report  (see  Boff,  Kaufman,  and  Thomas,  1986;  Lindsay  and  Norman,  1977),  the  following 
section  discusses  some  of  the  more  critical  aspects  of  the  human  information  processing  system  as 
it  relates  to  the  performance  and  measurement  of  pilot  communication  tasks.  Section  3.1  briefly 
discusses  research  on  auditory  information  processing,  and  Section  3.2  describes  the  major  pro¬ 
cessing  models  upon  which  operator  performance  and  workload  assessment  theories  are  based. 
The  development  of  a  performance  metric  assessing  voice  communication  effectiveness  should  be 
based  upon  the  findings  described  in  the  literature. 

3.1.  AUDITORY  INFORMATION  PROCESSING 

A  majority  of  a  pilot's  communication  activities  involve  some  auditory  component  Many  tasks 
require  the  pilot  to  identify  and  respond  to  incoming  verbal  messages  and  comments.  Unfor¬ 
tunately,  a  large  amount  of  incoming  verbal  information  may  be  degraded  due  to  electrical  and 
acoustical  noise,  radio  interference,  and  jamming  (McKinley,  1980).  Such  environmental  inter¬ 
ference  often  increases  the  difficulty  of  the  communication  task,  perhaps  even  making  successful 
completion  of  the  task  impossible.  The  remainder  of  this  subsection  discusses  the  ability  of 
humans  to  process  complex  auditory  information,  specifically  speech  sounds.  Topics  to  be 
covered  include  auditory  attention  and  auditory  memory. 

3.1.1,  Auditory  Attention 

Research  on  auditory  attention  was  initiated  in  the  early  1950s  based  on  the  need  to  better  under¬ 
stand  the  communication  behavior  of  air  traffic  controllers  and  pilots  who  needed  to  respond 
quickly  and  accurately  to  a  wide  range  of  both  visual  and  auditory  information.  A  majority  of  this 
research  has  focused  on  problems  resulting  from  two  of  the  tasks  required  of  such  operators.  The 
first  task  is  the  tracking  of  one  of  several  simultaneously  presented  messages  (selective  listening. 
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focused  attention).  The  second  is  the  tracking  of  several  simultaneously  presented  messages  or 
signals  (divided  attention). 


3 . 1 . 1 . 1 .  Focused  Attention 

Selective  listening  or  focused  attention  tasks  require  the  operator  (the  listener)  to  focus  on  one  of 
two  or  more  simultaneously  presented  messages  while  disregarding  the  other(s).  In  such  situa¬ 
tions,  the  listener  must  separate  the  components  of  the  wanted  message  from  those  of  the  unwanted 
background  (other  messages).  The  "cocktail  party  effect"  (Cherry,  1966)  is  such  an  example.  The 
"cocktail  party  effect"  refers  to  the  ability  of  party  guests  to  attend  to  one  conversation  although 
many  may  be  occurring,  even  if  that  conversation  is  more  distant  or  less  loud  than  others. 

The  effectiveness  of  selection  has  been  studied  in  two  ways:  (1)  by  comparing  the  detection,  rec¬ 
ognition,  and/or  comprehension  of  auditory  inputs  presented  alone  with  those  presented  under 
simultaneous  listening  conditions;  and  (2)  assessing  the  effectiveness  of  ignoring/rejecting  an  audi¬ 
tory  input.  Further  discussion  describes  research  on  the  comprehension  of  messages  and  the 
effects  of  ignoring  messages.  A  review  of  other  research  may  be  found  in  Hawkins  and  Presson 
(1986). 

Research  has  identified  a  number  of  factors  (cues)  which  influence  selective  listening  performance: 
spatial  location  of  the  signal,  pitch  of  the  signal,  semantic  content  of  the  signal/message,  and 
intensity  of  the  signal. 

Spatial  location  (localization  or  lateralization)  of  a  sound  is  determined  in  pan  by  interaural  time 
(phase)  and  interaural  intensity  (differences  in  time  of  arrival  and  intensity  of  the  sound  at  the  two 
ears).  The  importance  of  these  cues  in  the  performance  of  selective  listening  has  been  demon¬ 
strated  by  Licklider  (1948).  Licklider  developed  an  improvement  to  a  voice  communications  head¬ 
phone  set  used  by  pilots.  By  altering  incoming  verbal  messages  so  that  the  voice  signal  was  out  of 
phase  at  the  two  ears,  yet  leaving  the  external  masking  noise  (noisy  environment)  in  the  same 
phase  at  the  two  ears,  pilots  were  better  able  to  avoid  the  masking  effect  of  the  external  noise.  This 
reduction  in  masking  associated  with  the  separation  of  the  apparent  source  locations  of  the  signal 
and  noise  is  called  the  masking  level  difference. 

Spatial  separation  of  message  sources  (free-field)  or  of  the  auditory  images  of  messages  (head¬ 
phones)  has  been  used  to  reduce  both  the  masking  and  confusability  associated  with  presenting 
simultaneous  messages.  Spieth,  Curtis,  and  Webster  (1954)  investigated  the  effects  of  speaker 
separation  under  free-field  listening  conditions.  Subjects  were  presented  with  two  simultaneous 
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messages  each  spoken  by  a  different  voice  of  the  same  sex.  Messages  contained  a  code  name,  the 
name  of  the  channel  calling,  the  number  of  the  caller,  and  a  question  about  the  visual  display  in 
front  of  the  subject.  The  subject's  task  was  to  report  the  channel  and  talker  calling,  and  answer  the 
question  posed.  In  some  conditions,  the  subject  had  the  option  to  switch  the  message  to  a  nearer 
speaker  or  to  the  headphones.  The  degree  of  speaker  separation  (0  degrees,  10  to  20  degrees,  or 
90  to  180  degrees)  was  maintained  in  these  conditions.  In  other  conditions,  visual  cues  (indicator 
lights  specifying  the  relevant  channel)  were  used.  Differential  frequency  filtering  of  the  messages, 
alone  or  in  combination  with  the  visual  cues,  comprised  a  final  set  of  conditions.  Results  of  the 
experiment  showed  that  performance  (correct  channel  identifications  and  correct  answers) 
improves  with  speaker  separation  except  under  conditions  where  the  task  is  already  quite  easy 
(i.e.,  visual  cues  and  filtering  are  present). 

The  experiment  by  Spieth,  Curtis,  and  Webster  (1954)  also  provides  an  example  of  how  pitch  can 
be  used  as  a  cue  in  selective  listening.  All  message  pairs  used  in  this  experiment  contained  both  a 
relevant  and  a  distracting  message.  High  pass  (all  frequencies  above  a  given  level  can  be  heard) 
and  low  pass  (all  frequencies  below  a  given  level  can  be  heard)  filtering  of  the  messages  was  used 
to  create  seven  dual-message  listening  conditions.  Results  indicated  that  filtering  significantly 
enhanced  performance  especially  in  conditions  where  no  other  cues  were  present  (i.e.,  spatial 
separation).  This  implies  that  until  the  point  at  which  filtering  begins  to  impair  the  intelligibility  of 
a  relevant  message,  procedures  which  enhance  the  distance  between  the  frequency  bands  of  rele¬ 
vant  and  distracting  signals  will  aid  in  selective  listening. 

Semantic  content  has  also  been  studied  as  a  cue  in  selective  listening  (Broadbent  and  Gregory, 
1964;  Miller  and  Selfridge,  1950;  Treisman,  1964).  Results  of  this  research  suggest  that  the 
semantic  structure  of  auditory  messages  is  as  useful  a  cue  for  selective  listening  as  is  spatial  separa¬ 
tion.  Selection  performance  may,  therefore,  be  improved  by  providing  semantic  differences 
between  relevant  and  irrelevant  information. 

Another  factor  which  influences  selective  listening  performance  is  the  intensity  of  the  auditory  sig¬ 
nals).  The  effectiveness  of  signal  intensity  (of  both  relevant  and  distracting  messages)  as  a  cue 
has  been  studied  by  Egan,  Carterette,  and  Thwing  (1954).  In  their  experiment,  subjects  were  pre¬ 
sented  with  two  simultaneous  messages,  either  monaurally  or  dichotically.  Each  message  began 
with  a  unique  call  sign.  For  example; 

LANGLEY  BASE. ..next  Tuesday  we  must  vote... 

MITCHELL  FIELD. ..the  fur  of  cats  goes  by  many  names... 
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The  subject's  task  was  to  reproduce  all  messages  that  followed  the  target  call  signs  specified  prior 
to  the  experiment  The  results  showed  that  as  the  relative  intensity  of  the  attended  message  was 
increased,  selectivity  improved.  This  improvement  accrued  more  rapidly  in  conditions  where  the 
messages  were  presented  dichoticaliy  than  in  those  where  they  were  presented  monaurally. 

3 . 1 . 1 . 2 .  Divided  Attention 

The  listener's  task  in  situations  icquiring  divided  attention  is  quite  different  from  that  of  selective 
attention.  Rather  than  attending  to  only  one  of  several  messages,  the  listener  must  now  attend  to 
two  or  more  of  those  messages,  responding  to  each  as  needed.  A  majority  of  the  research  on 
divided  attention  has  attempted  to  determine  the  conditions  under  which  attention  can  be  success¬ 
fully  split  between  simultaneous  inputs.  This  research  suggests  that  in  situations  where  the  listener 
is  monitoring  two  channels,  yet  is  listening  for  a  single  target,  no  divided  attention  costs  will 
occur.  However,  when  listening  for  multiple,  independent  targets,  task  performance  will  depend 
on  (1)  the  listener's  perception  of  the  events  presented  through  channels  other  than  the  channel 
through  which  the  target  is  presented,  (2)  the  amount  of  practice  the  listener  has  had  on  the  task, 
and  (3)  the  modality  in  which  the  stimulus  inputs  are  presented  (auditory  or  auditory  plus  another 
modality). 

The  listener’s  reaction  to  events  in  the  off-channel  will  be  one  of  four  types:  (1)  a  hit  (the  off- 
channel  carries  a  target  that  is  correctly  identified),  (2)  a  miss  (the  off-channel  carries  a  target  which 
is  not  identified),  (3)  a  false  alarm  (a  nontarget  in  the  off-channel  is  identified  as  a  target),  and  (4)  a 
correct  rejection  (a  nontarget  is  correctly  identified).  In  general,  the  listener's  performance  at  identi¬ 
fying  an  input  through  a  given  channel  is  best  when  the  response  to  the  off-channel  event  is  a  cor¬ 
rect  rejection,  intermediate  when  it  is  a  miss,  and  poorest  when  it  is  a  hit  or  false  alarm. 

To  date,  little  research  has  investigated  the  effects  of  practice  on  divided  attention.  Ostry,  Moray, 
and  Marks  (1976)  found  practice  did  improve  performance  on  divided  attention  tasks.  Perfor¬ 
mance  in  this  study  did  not,  however,  increase  to  the  level  of  focused  attention  performance. 

The  general  conclusion  of  the  research  on  divided  attention  and  resource  modalities  suggests  that 
strong  divided  attention  effects  occur  with  heteromodal  stimulus  presentation  just  as  they  do  with 
homomodal  presentation.  Which  type  of  presentation  is  superior  remains  unclear.  Moray  (1988) 
ties  together  the  concepts  of  competing  resources  and  extended  practice.  Although  evidence  sug¬ 
gests  that  resources  are  separate  in  the  brain  (i.e.,  visual  and  auditory  processing  do  not  generally 
compete  for  the  same  neural  mechanisms),  they  are  not  completely  independent.  Some  amount  of 
interference  (either  in  the  form  of  delay  or  inaccuracy)  will  occur  when  attention  is  shared  between 
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sense  modalities.  With  extended  practice  some  aspects  of  processing  can  be  automatized.  This  is 
especially  true  if  the  stimuli  in  question  consistently  requires  the  same  response  (i.e.,  consistent 
mapping).  Automatization  of  these  processes  generally  leads  to  improved  performance. 

3.1.2.  Auditory  Memory 

Auditory  or  echoic  memory  is  the  memory  system  which  stores  the  physical  (acoustic)  properties 
of  auditory  inputs.  At  the  presentation  of  an  auditory  stimulus,  this  system  stores  a  representation 
of  that  stimulus  in  a  code  very  similar  to  the  original  input.  The  information  is  retained  until  it  can 
be  selectively  processed  and  recorded  into  short-term  memory. 

A  variety  of  factors  affect  the  retention  of  information  in  auditory  memory.  A  number  of 
researchers  have  shown  that  the  temporal  interval  between  a  memory  item  and  an  interfering  stimu¬ 
lus  will  affect  information  retention.  Hawkins  and  Presson  (1977)  and  Massaro  (1970)  have 
found  that  as  the  delay  between  the  presentation  of  a  test  stimulus  (an  auditory  tone)  and  a  masking 
stimulus  increases,  degradations  (i.e.,  reduced  recall  accuracy  of  the  test  stimulus)  in  echoic  mem¬ 
ory  retention  decrease  to  near  zero. 

Laterality  of  the  interfering  stimulus  item  (i.e.,  masking  tone)  also  appears  to  be  a  factor  affecting 
the  retention  of  information  in  echoic  memory.  The  auditory  system  is  extremely  sensitive  to  differ¬ 
ences  in  the  timing  and  intensity  of  stimuli  presented  to  both  ears.  Laterally  separating  the  test  and 
masking  stimuli  has  been  found  to  reduce  performance  degradations  that  are  caused  by  the  interfer¬ 
ence  stimuli  (Hawkins  and  Presson,  1977;  Massaro,  1970). 

3.2.  MODELS  OF  INFORMATION  PROCESSING 

A  large  number  of  models  of  human  information  processing  have  been  postulated.  This  section 
describes  the  major  models  of  information  processing  upon  which  human  performance  and  work¬ 
load  assessment  techniques  have  been  based. 

Broadbent  (1958)  described  a  limited  capacity  filter  model  of  human  information  processing.  In 
Broadbent's  model,  the  human  may  simultaneously  receive  input  directly  from  any  or  all  of  the 
senses  (parallel  processing).  This  information  is  transmitted  directly  from  the  senses  to  some 
short-term  storage  area  where  a  "selective  filter"  determines  which  information  is  to  be  processed 
through  the  limited  capacity  channel  (central  processing).  It  is  at  this  point  that  parallel  processing 
stops.  The  processing  channel  can  now  handle  only  one  source  of  input  at  a  time.  Broadbent 
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proposed  this  theory  based  upon  his  studies  of  dichotic  listening.  Noticing  that  subjects'  compre¬ 
hension  of  verbal  messages  decreased  when  subjects  heard  two  different  messages  simultaneously 
(one  in  each  ear),  Broadbent  proposed  that  subjects  have  the  capacity  to  listen  to  only  one  voice  at  a 
time. 

This  trade-off  between  attending  to  one  voice  or  another  may  also  be  explained  as  a  simultaneous 
sharing  of  attention  rather  than  a  switching  back  and  forth  (as  described  by  Broadbent).  Treisman 
(1964)  described  an  attenuation  theory  of  processing  to  explain  Broadbent's  results  as  well  as 
her  own.  This  theory  suggests  that  subjects  who  were  instructed  to  attend  primarily  to  one  of  two 
voices  would  be  able,  at  the  same  time,  to  allocate  a  small  portion  of  their  attention  to  the  other 
voice.  With  this  theory,  Treisman  was  able  to  account  for  a  subjects'  apparent  sensitivity  to  certain 
kinds  of  information  presented  to  the  nonattended  ear  (i.e.,  their  own  name). 

Norman  (1976)  went  further  to  suggest  that  the  selection  of  information  occurs  not  by  selectively 
blocking  or  filtering  sensory  information,  but  by  selectively  processing  information  already  evoked 
or  activated  in  memory  by  incoming  sensory  information. 

These  theories  and  others  (Cherry,  1953;  Moray,  1959)  came  to  be  known  as  "bottleneck"  models 
of  human  information  processing.  They  have  in  common  that  they  seek  to  determine  at  what  stage 
of  processing  a  parallel  system  (capable  of  processing  separate  channels  concurrently)  narrows  to  a 
serial  system  that  can  handle  only  one  channel  at  a  time.  Bottleneck  theories  can  be  divided  into 
two  general  classes:  early-selection  theories  (Broadbent,  1958;  Treisman,  1969)  that  consider  the 
bottleneck  to  occur  at  perception,  and  late-selection  theories  (Deutsch  and  Deutsch,  1963;  Norman, 
1968)  that  consider  the  bottleneck  to  occur  at  the  point  at  which  decisions  to  initiate  responses  (i.e., 
storage  of  information  in  long  term  memory,  rehearsal  of  information)  are  made. 

3.2.1.  Capacity  Theories 

Capacity  theories  of  human  processing  came  about  as  a  direct  result  of  human  factors  research  on 
mental  workload.  Capacity  theories  conceptualize  the  human  as  possessing  a  "pool"  of  processing 
facilities  or  "resources."  The  concept  of  resources  will  be  described  in  greater  detail  later  in  this 
section.  The  first  capacity  theory  was  presented  by  Knowles  (1963).  Knowles  theory,  having 
direct  application  to  operator  task  performance,  proposes  that  as  a  task  (primary  task)  demands 
more  of  an  operator’s  resources  (i.e.,  becomes  more  difficult),  fewer  of  those  resources  are  avail¬ 
able  for  successful  concurrent  performance  of  a  second  task  (secondary  task).  Performance  of  this 
second  task,  then,  is  expected  to  deteriorate. 
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The  major  distinction  between  bottleneck  theories  and  capacity  theories  is  that,  rather  than  the 
structures  of  the  human  processing  system  being  dedicated  only  to  one  task  at  a  time,  capacity  may 
be  allocated  in  varying  amounts  among  separate  activities  (i.e.,  numerous  tasks).  Many  other 
researchers  have  made  contributions  to  the  further  development  of  capacity  theory  (Moray,  1967; 
Moray,  Johannsen,  Pew,  Rasmussen,  Sanders,  and  Wickens,  1979).  This  research  allowed  the 
resource  metaphor  to  develop  from  a  concept  into  a  quantitative  theory  with  testable  predictions  and 
important  implications  for  the  measurement  of  human  behavior. 

Perhaps  the  most  important  recent  contribution  to  capacity  theory  has  been  made  by  Wickens 
(1984b).  Whereas  other  capacity  theories  have  assumed  a  single  pool  of  undifferentiated  resources 
available  to  all  stages  of  processing,  Wickens  proposed  the  existence  of  multiple  resources. 
According  to  this  view,  the  human  information  processing  system  contains  a  number  of  commodi¬ 
ties  which  may  be  assigned  "resource-like"  properties  (i.e.,  sharing,  allocation).  The  major  com¬ 
ponents  of  multiple-resource  theory  will  now  be  discussed. 

3.2.2.  Resources 

The  concept  of  resources  may  be  loosely  defined  as  processing  facilities  existing  in  some  finite 
amount  (Navon  and  Gopher,  1979).  Other  researchers  have  referred  to  this  concept  as  effort, 
capacity,  and  attention  (Kahneman,  1973;  Moray,  1967;  Shiffrin,  1976).  Multiple-resource  theory 
assumes  that  these  resources  reside  in  separate  "reservoirs"  or  "pools"  (Figure  4).  This  is  contrary 
to  single-resource  theory  which  assumes  one  undifferentiated,  shareable  pool  of  resources. 

Wickens  (1980,  1984b)  has  developed  a  framework  for  determining  the  functional  composition  of 
these  attentional  resource  reservoirs  based  upon  the  results  of  a  large  number  of  dual-task  studies. 
This  framework  defines  an  operator's  resources  as  a  three-dimensional  metric  consisting  of  stages 
of  processing  (perceptual/central  processing  versus  response),  codes  of  perceptual  and  central  pro¬ 
cessing  (verbal  versus  spatial),  modalities  of  input  (visual  versus  auditory),  and  modalities  of 
response  (manual  versus  vocal)  (Figure  5). 

Due  to  the  independent,  nonoverlapping  characteristics  of  Wickens'  concept  of  resources,  a  num¬ 
ber  of  implications  follow:  (1)  tasks  demanding  completely  nonoverlapping  resources  will  always 
be  perfectly  time-shared,  and  (2)  if  two  tasks  utilize  partially  separate  resources,  the  degree  of  inter¬ 
ference  (time-sharing  efficiency)  will  be  unaffected  by  the  distance  between  the  nonoverlapping 
components  of  these  resources.  Wickens  (1984b)  provides  data  supporting  these  implications. 
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Figure  4.  Difficulty  Performance  Trade-Offs  [(A)  Task  A  and  B  Share  Resources  I  and  II, 
(B)  Tasks  A  and  B  Demand  Exclusively  Resources  I  and  II]  (Wickens,  1984b) 


suq*» 


Figure  5.  Resource  Modalities  (Wickens,  1984b) 
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The  usefulness  of  the  multiple  resource  concept  to  the  applied  science  community  is  that  it  allows 
the  researcher  or  designer  to  predict  what  combinations  of  task  components  have  the  potential  to 
cause  poor  operator  performance.  This  information  may  then  be  used  to  reevaluate  and  redesign 
equipment,  tasks,  or  strategies  which  will  result  in  optimal  operator  performance. 
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Section  4 

OPERATOR  PERFORMANCE  MEASURES 


The  human  performance  literature  contains  a  large  number  of  theories  and  techniques  for  mea¬ 
surement  of  human  performance  in  a  variety  of  situations.  Because  the  nature  of  AAMRL/BBA's 
current  research  on  the  effectiveness  of  voice  communication  in  interactive  environments  involves 
such  a  strong  cognitive  component,  SRL  has  chosen  to  concentrate  a  large  portion  of  its  literature 
survey  efforts  in  the  area  of  mental  workload.  Use  of  the  mental  workload  literature,  as  opposed 
to  some  of  the  other  areas  of  the  human  performance  literature,  has  a  number  of  advantages.  First, 
mental  workload  research  is  founded  on  the  basic  principles  of  psychology  and  physiology.  Both 
theoretical  and  applied  research  in  this  area  have  utilized  basic  principles  in  human  information  pro¬ 
cessing,  cognition,  learning  and  memory,  arousal,  motivation,  etc.  Second,  the  majority  of 
research  on  mental  workload  has  been  conducted  for  eventual  application  in  operational  environ¬ 
ments,  specifically  the  flight  environment  Finally,  the  mental  workload  literature  contains  well- 
defined  guidelines  for  application  of  its  theory  and  its  tasks  to  specific  situations  (both  experimental 
and  operational).  These  guidelines  are  based  upon  well -documented,  empirical  evidence. 

The  following  sections  contain  a  review  of  a  large  amount  of  the  mental  workload  literature,  as 
well  as  some  of  the  more  basic  literature  on  human  performance.  This  review  should  not,  how¬ 
ever,  be  viewed  specifically  with  respect  to  the  assessment  of  workload.  Rather,  it  should  be 
viewed  as  a  useful  framework  with  which  to  view  more  basic  research  on  human  performance. 

4.1.  THE  CONCEPT  OF  WORKLOAD 

Mental  workload  has  been  defined  in  the  literature  in  a  broad  variety  of  ways.  Cooper  and  Harper 
(1969),  for  example,  define  workload  as  "the  integrated  physical  and  mental  effort  required  to  per¬ 
form  a  specified  task."  Tennstedt  (as  cited  in  Roscoe,  1978)  defines  workload  as  "as  summation 
of  such  processes  as  perception,  evaluation,  decision  making,  and  actions  taken  to  accommodate 
those  needs  generated  by  influences  originating  within  or  without  the  system."  Many  other  authors 
have  defined  workload  in  many  other  ways.  However,  the  majority  of  current  definitions  (Hart, 
1982;  O'Donnell  and  Eggemeier,  1983;  Roscoe,  1978)  define  workload  as  being  comprised  of  the 
following  contributing  factors:  task  demands,  operator  variables,  and  operator  response  (out¬ 
come).  Task  demands  include  such  factors  as  difficulty,  time  constraints,  time  pressure,  and  criti¬ 
cality.,  Operator  variables  include  effort,  motivation,  skill,  experience,  stress,  personality,  and 
fatigue.  Finally,  response  considerations  include  mode  of  response  (i.e.,  manual,  verbal,  simple, 
complex),  feedback,  and  success.  Figure  6  illustrates  these  factors  and  the  ways  in  which  they 
may  combine  to  produce  workload.  Figure  7  illustrates  a  conceptual  framework  through  which  to 
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Figure  6.  The  Factors  Comprising  Workload 
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Figure  7.  A  Conceptual  Framework 


view  the  various  workload  factors  and  the  ways  in  which  they  interact  to  affect  performance.  Thor¬ 
ough  considerations  of  the  concept  of  workload  can  be  found  in  O'Donnell  and  Eggemeier  (1983), 
Moray  (1979),  Roscoe  (1978),  and  Wickens  (1984a).  For  purposes  of  this  report,  however,  work¬ 
load  will  be  defined  as  a  multifaceted  concept  formed  by  the  interactions  of  the  demands  of  the 
task(s),  operator  effort,  and  performance  outcome. 

4.2.  PERFORMANCE  ASSESSMENT  METHODS 

4.2.1.  Metric  Selection  Criteria 

A  number  of  criteria  to  assist  in  the  selection  of  workload  and  other  operator  performance  measures 
have  been  discussed  in  the  literature.  Because  the  majority  of  operator  performance  measures  carry 
with  them  a  number  of  intrinsic  constraints  and  methodological  requirements,  thereby  limiting  their 
applicability,  consideration  of  these  selection  criteria  is  important.  This  section  will  review  the  cri¬ 
teria  to  be  used  when  selecting  operator  performance  techniques  for  particular  applications.  The 
criteria  to  be  discussed  include  sensitivity,  diagnosticity,  intrusion,  implementation  requirements, 
and  operator  acceptance. 

•  Sensitivity:  Sensitivity  is  the  term  used  in  the  literature  to  describe  the  ability  of  a  measurement 
technique  to  detect  changes  in  operator  load  that  are  caused  by  performance  of  a  task  or  group 
of  tasks  (Chiles,  1982;  Wickens,  1984a).  Workload  techniques,  in  particular,  have  been  found 
to  differ  in  sensitivity  (Bell,  1978;  Hicks  and  Wierwille,  1979;  Wickens  and  Yeh,  1983;  and 
Wierwille  and  Casali,  1983a).  It  is  important  to  match  the  sensitivity  of  a  technique  with  the 
requirements  of  an  application.  In  some  situations,  a  relatively  insensitive  technique  may  be 
sufficient,  for  example,  if  it  is  required  only  to  identify  areas  of  extreme  workload  in  a  system 
or  procedure.  Other  applications  may  require  finer  discriminations  of  load  (for  example,  when 
determining  crew  compositions).  Choosing  the  sensitivity  of  a  measurement  technique  is  deter¬ 
mined  by  the  objective  or  goal  to  be  satisfied  by  the  use  of  that  technique.  If  the  objective  is  to 
determine  whether  a  task  or  system  already  contains  levels  of  loading  which  could  lead  to 
degraded  operator  performance,  primary  task  measurement  techniques  will  be  adequate.  If  the 
goal  is  to  determine  whether  or  not  the  potential  for  overload  and  degraded  performance  exists, 
more  sensitive  techniques  (physiological,  secondary  task,  and  subjective)  should  be  used. 

♦  Diagnosticity:  The  criteria  of  diagnosticity  is  based  upon  the  multiple  resources  theory  of 
capacity  limitations  of  the  human  information  processing  system  (see  Section  3).  Diagnosticity 
describes  the  ability  of  a  measurement  technique  to  discriminate  the  amount  of  task  load 
imposed  upon  the  different  cognitive  resources  of  the  operator  (e.g.,  perceptual  versus  central 
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processing).  Techniques  have  been  found  to  differ  in  their  degree  of  diagnosticity  (Reid, 
Shingledecker,  and  Eggemeier,  1981;  Wickens  and  Derrick,  1981;  Wierwille  and  Casali, 
1983b). 

The  diagnosticity  of  specific  measurement  techniques  will  be  discussed  further  in  later  sections; 
however,  subjective  and  primary  task  measures  have  generally  been  shown  to  exhibit  low  diag¬ 
nosticity,  while  secondary  and  physiological  measures  are  considered  highly  diagnostic.  As 
with  sensitivity,  the  choice  between  using  a  diagnostic  versus  a  global  measurement  technique 
should  be  determined  by  the  objective  to  be  met  If  the  goal  of  the  research  effort  is  simply  to 
determine  if  a  loading  problem  exists  somewhere  in  the  system,  techniques  associated  with 
low  diagnosticity  (i.e.,  subjective,  primary  task)  will  be  adequate.  If  specific  information  con¬ 
cerning  the  locus  of  a  previously  identified  problem  (i.e.,  to  suggest  design  modifications)  is 
desired,  more  diagnostic  techniques  (secondary  task,  physiological)  should  be  chosen. 

•  Intrusion:  Intrusion  refers  to  the  degree  to  which  a  measurement  technique  degrades  ongoing 
primary  task  performance.  Certain  degrees  of  primary  task  intrusion  may  be  acceptable  in 
some  situations.  In  laboratory  or  simulation  applications,  intrusion  may  not  be  a  great  con¬ 
sideration.  However,  in  many  operational  applications,  due  to  safety  considerations,  the  use  of 
techniques  which  might  cause  degradations  in  primary  task  performance  is  precluded.  Intru¬ 
sion  can  also  cause  problems  in  data  interpretation.  Measurement  techniques  which  cause 
significant  degradations  in  primary  task  performance  cannot  be  used  to  accurately  predict  the 
amount  of  load  required  for  unimpaired  performance  on  the  primary  task.  The  degree  of 
intrusion  associated  with  the  various  operator  performance  tasks  again  appears  to  differ. 
Although  the  database  addressing  this  issue  is  small  (see  Casali  and  Wierwille,  1982;  Ogden, 
Levine,  and  Eisner,  1979,  Rolfe,  1971;  Wierwille  and  Connor,  1983),  it  appears  that  subjec¬ 
tive  and  physiological  techniques  are  associated  less  with  problems  of  intrusion  than  are 
secondary  task  techniques. 

4.2.2.  Major  Classes  of  Assessment  Techniques 

Currently,  there  exist  three  major  classes  of  human  performance  and  workload  measurement  tech¬ 
niques:  physiological  techniques,  subjective  techniques,  and  performance- based  techniques. 

4.2.2. 1 .  Physiological  Techniques 

The  rationale  for  using  physiological  measures  to  study  aspects  of  human  performance  such  as 

workload  is  based  upon  the  concept  of  "activation"  or  "arousal"  (Roscoe,  1978).  Arousal  can  be 
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defined  as  "a  state  of  preparedness  of  the  body  associated  with  increased  activity  in  the  nervous 
system"  (Roscoe,  1978).  It  is  assumed  that  arousal  and  performance  are  directly  related,  so  that 
varying  levels  of  physiological  activity  should  provide  realistic  estimates  of  differing  levels  of 
workload  or  performance.  Implicit  in  this  assumption  is  the  need  to  monitor  not  only  physiological 
activity,  but  performance  as  well. 

The  overall  usefulness  of  physiological  techniques  as  measures  of  operator  performance  is  unclear. 
ODonnell  and  Eggemeier  (1986)  argue  that,  although  the  concept  of  measuring  workload  through 
physiological  processes  would  seem  simple,  a  majority  of  such  efforts  have  failed  to  find  consis¬ 
tent  patterns  of  physiological  change  to  correspond  with  known  changes  in  workload.  Hassett 
(1978)  has  suggested  that,  rather  than  viewing  physiological  measures  as  global  indices  of  effort, 
arousal,  or  activation,  they  should  be  viewed  instead  as  potential  indices  of  specific  psychological 
processes. 

The  following  subsections  discuss  some  of  the  more  commonly  used  physiological  measures  of 
mental  workload  and  human  performance.  These  measures  will  be  discussed  only  briefly;  further 
detail  can  be  found  in  O'Donnell  (1979). 

4 . 2 . 2 . 1 . 1 .  Measures  of  Brain  Function 

The  electroencephalogram  (EEG)  records  the  brain's  activity  via  surface  electrodes  placed  directly 
on  the  scalp.  Such  measures  have  been  taken  during  and  after  the  performance  of  a  task  in  hopes 
that  the  overall  activation  level  in  the  brain  would  change  directly  as  a  function  of  imposed  task 
load.  Such  techniques  have  not,  however,  yielded  consistent  or  interpretable  results  (Lawrence, 
1979;  ODonnell  and  Eggemeier,  1986;  ODonnell  and  Wilson,  1987;  Roscoe,  1978).  Other 
measures  of  brain  function,  such  as  various  measures  of  evoked  cortical  response  have  shown 
more  impressive  results:  signal  analysis  techniques  (Callaway,  Tueting,  and  Koslow,  1978); 
transient  conical  evoked  response  (Lawrence,  1979;  O'Donnell,  1979;  Squires,  Wickens,  Squires, 
and  Donchin,  1976);  transient  response  to  primary  task  (Gomer,  Spicuzza,  and  ODonnell,  1976); 
steady  state  evoked  response  (Reagan,  1977);  and  multiple  site  recording  (Doyle,  Omstein,  and 
Galin,  1974;  Gevins,  1983).  These  measures  appear  to  be  useful  for  assessing  the  performance 
effects  of  task  load. 

4. 2. 2. 1.2.  Measures  of  Eye  Function 

Measures  of  eye  function  are  valuable  methods  for  task  performance  assessment  because  of  their 
low  intrusiveness,  high  operator  acceptance,  and  ease  of  implementation.  The  most  frequently  used 
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measures  include  pupillary  response,  eye  fixation,  scanning  patterns,  eye  blinks,  and  movement 
speed.  These  measures  have  generally  yielded  consistent  and  sensitive  results;  however,  they  are 
relatively  undiagnostic,  providing  only  very  global  indications  of  task  load. 

4 . 2 . 2 . 1 . 3 .  Measures  of  Cardiac  Function 

The  electrocardiogram  (ECG),  blood  pressure,  blood  volume,  and  oxygen  concentration  have  all 
been  used  as  cardiac  indicators  of  overall  workload  and  specific  task  load  (O'Donnell  and 
Eggemeier,  1986).  Although  measures  of  cardiac  function  have  been  somewhat  successful  as 
predictors  of  workload  in  several  studies  (Casali  and  Wierwille,  1983;  Hicks  and  Wierwille,  1979; 
Wierwille  and  Connor,  1983),  it  is  unclear  exactly  how  cardiac  function  changes  with  different 
types  and  amounts  of  task  load  (O'Donnell  and  Eggemeier,  1986).  Until  more  data  are  established, 
these  measures  must  be  considered  potentially  useful  but  unvalidated  measures  of  task  load. 

4.2.2. 1 .4.  Measures  of  Muscle  Function 

Myoelectric  signals  generated  by  muscle  contractions  have  also  been  used  to  measure  mental  and 
physical  workload  using  an  electromyograph  (EMG).  These  signals  are  measured  either  with 
surface  electrodes  placed  directly  over  the  muscle,  or  needle  electrodes  placed  directly  into  the 
muscle.  Physical  work  is  indicated  by  the  actual  muscle  activity  at  the  specified  muscle,  while 
mental  work  is  indicated  by  the  static  tension  level  of  a  muscle  not  directly  involved  in  the  perfor¬ 
mance  of  the  task  (O'Donnell  and  Eggemeier,  1986).  Current  measures  of  muscle  function,  due  to 
the  necessities  of  their  measurement  techniques,  are  not  recommended,  as  they  are  not  simple,  sen¬ 
sitive,  or  diagnostic,  and  have  obvious  intrusion  and  safety  limitations. 

4. 2. 2. 2.  Subjective  Techniques 

Subjective  measures  of  operator  effort  and  task  load  require  the  operator  to  report  the  amount  of 
"load"  experienced  in  the  performance  of  a  particular  task  or  set  of  tasks.  The  majority  of  such 
techniques  described  in  the  literature  are  designed  specifically  to  assess  "workload,"  rather  than 
simple  "task  load."  However,  as  the  two  concepts  are  similar  (task  load  may  be  viewed  as  one 
component  of  workload),  subjective  workload  assessment  techniques  will  be  described  and  dis¬ 
cussed  with  the  assumption  that  they  are  also  applicable  to  task  loading  situations. 

Subjective  measures  have  been  used  extensively  to  assess  operator  workload  due  to  their  practical 
advantages  (ease  of  implementation,  nonintrusiveness,  high  operator  acceptance),  and  their  capa¬ 
bility  of  discriminating  among  different  levels  of  load  (sensitivity)  (Moray,  1982;  O'Donnell  and 


Eggemeier,  1986;  Williges  and  Wierwille,  1979).  Subjective  measures  are  not,  however,  con¬ 
sidered  diagnostic  (O'Donnell  and  Eggemeier,  1986).  Available  evidence  (O'Donnell  and 
Eggemeier,  1986)  suggests  current  measures  represent  a  global  measure  of  load  and,  therefore, 
should  be  used  as  general  screening  devices  to  determine  if  overload  exists  anywhere  within  task 
performance.  The  most  commonly  used  rating  scales  and  psychometric  subjective  workload 
assessment  techniques  will  now  be  described. 

4. 2. 2. 2.1.  Rating  Scales 

The  Cooper-Harper  Aircraft  Handling  Characteristics  Scale  (Cooper  and  Harper,  1969),  designed 
for  use  by  test  pilots  to  assess  the  ease  of  control  of  various  aircraft,  has  been  used  extensively  as 
an  index  of  mental  workload.  This  ten-point  rating  scale  requires  the  pilot  to  judge  the  adequacy  of 
an  aircraft  for  some  specified  task  or  operation.  The  assumption  that  handling  qualities  and  opera¬ 
tor  workload  are  directly  related  has  been  supported  by  a  number  of  research  efforts  (Hess,  1977; 
Moray,  1982;  Williges  and  Wierwille,  1979). 

Modified  Cooper-Harper  rating  scales  (North,  Stackhouse,  and  Graffunder,  1979;  Wierwille  and 
Casali,  1983a;  Wolfe  1978)  have  also  been  used  to  measure  mental  workload.  These  scales  are 
quite  similar  to  the  original  Cooper-Harper  scale,  with  the  exception  that  references  to  aircraft  han¬ 
dling  characteristics  in  the  original  scale  were  replaced  by  descriptors  of  pilot  workload  effort.  As 
with  the  original  Cooper-Harper  scale,  available  data  support  its  sensitivity  to  varying  levels  of 
load  but  again  suggest  its  lack  of  diagnosticity  (North  et  al.,  1979;  Wierwille  and  Casali,  1983b; 
Wolfe,  1978). 

Two  other  rating  scales  have  been  used  to  measure  factors  associated  with  workload.  These  scales, 
generally  known  as  University  of  Stockholm  Scales,  measure  the  perceived  difficulty  and  per¬ 
ceived  effort  of  the  operator.  These  ratings  scales  have  been  used  in  conjunction  with  intelligence 
test  items  (reasoning,  spatial  ability,  and  verbal  comprehension)  (Bratfisch,  1972;  Bratfisch,  Borg, 
and  Domic,  1972;  Hallsten  and  Borg,  1975),  visual  discrimination  tasks,  letter  transformation 
tasks,  digit  transformation  tasks,  and  visual  auditory  detection  tasks  (Bratfisch,  Borg,  and  Domic, 
1972;  Domic,  1980). 

Overall,  rating  scales  used  as  subjective  measures  of  mental  load  have  proven  to  be  sensitive  indica¬ 
tors  of  operator  effort  and  expenditure.  They  are  nonintrusive,  are  easily  implemented,  appear  to 
have  high  operator  acceptance,  and  are  generally  not  time-consuming.  Rating  scale  measurements 
are,  however,  relatively  undiagnostic  and  should  be  interpreted  as  global  indicators  of  operator 
mental  load. 
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4. 2. 2. 2. 2.  Interviews  and  Questionnaires 


Interviews  and  questionnaires  have  also  been  used  as  techniques  to  gather  subjective  data  on  opera¬ 
tor  mental  load.  Williges  and  Wierwille  (1979)  describe  the  variety  of  these  procedures  which 
range  from  open-ended  debriefing  sessions  to  carefully  designed  questionnaires.  As  these  tech¬ 
niques  are  less  structured  than  rating  scales,  obtained  data  may  be  difficult  to  interpret  These  tech¬ 
niques  can,  however,  be  valuable  when  used  in  conjunction  with  other  measures,  by  providing 
information  which  might  not  otherwise  be  obtainable.  Again,  it  is  recommended  that  questionnaires 
and  interviews  not  be  relied  upon  as  stand-alone  techniques  for  assessing  operator  load. 

4. 2. 2. 2. 3.  Psychometric  Techniques 

Psychometric  measures  which  have  been  used  to  assess  operator  load  include  magnitude  estima¬ 
tion,  paired  comparison,  equal-appearing  intervals,  and  conjoint  measurement.  These  methods  can 
generate  interval-scaled  data  which  provide  certain  interpretation  advantages  in  data  analysis  over 
many  of  the  other  subjective  assessment  techniques.  Detailed  descriptions  of  these  techniques  are 
provided  in  O'Donnell  and  Eggemeier  (1986). 

4 . 2 . 2 . 3 .  Performance  Based  Measures 

Performance  based  measures  derive  an  index  of  operator  loading  from  some  aspect  of  operator 
behavior  or  activity  (i.e.,  task  performance).  These  measures  are  also  referred  to  as  behavioral 
measures  (Shingledecker,  1983;  Williges  and  Wierwille,  1979). 

4. 2. 2. 3.1.  Primary  Task  Measures 

Primary  task  methods  measure  the  operator's  performance  on  some  task  or  design  option  of  inter¬ 
est.  It  is  assumed  that,  as  the  load  on  the  operator  increases,  performance  of  the  task  will  change, 
usually  resulting  in  some  amount  of  degradation.  Measurement  of  that  degradation  is  used  to  pro¬ 
vide  an  index  of  the  load  associated  with  the  task.  The  workload  literature  describes  two  types  of 
primary  task  measures:  single  and  multiple  primary  task  measures. 

•  Single  Primary  Task  Measures:  Single  primary  task  measures  use  a  single  aspect  of  primary 
task  performance  (number  of  errors,  speed  of  performance,  etc.)  as  an  indication  of  operator 
load.  In  this  paradigm,  the  primary  task  measure  should  be  chosen  to  reflect  an  aspect  of  per¬ 
formance  that  is  expected  to  be  influenced  by  the  manipulation  of  the  load.  This  is  often 
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difficult.  However,  it  is  a  critical  consideration  as  the  success  of  the  evaluation  is  dependent  on 
a  single  parameter  of  performance. 


Many  successful  applications  of  this  paradigm  are  described  in  the  literature  (Hicks  and 
Wierwille,  1979;  Isreal,  Chesney,  Wickens,  and  Donchin,  1980;  Kraus  and  Roscoe,  1972; 
Wierwille  and  Connor,  1983;  Williges  and  Wierwille,  1979).  Single  primary  task  measures 
have  successfully  distinguished  variations  in  load,  especially  across  moderate  levels  of  load,  as 
well  as  discriminating  overload  from  nonoverload  situations. 

Instances  in  which  appropriate  single  primary  task  measures  have  failed  to  reflect  manipula¬ 
tions  of  task  load  have  also  been  reported  (Bell,  1978;  Burke,  Gilson,  and  Jagacinski,  1980; 
Schultz,  Newell,  and  Whitbeck,  1970). 

Multiple  Primary  Task  Measures:  Data  can  also  be  collected  on  multiple  aspects  of  a  primary 
task.  This  paradigm  is  generally  used  in  simulated  or  real-world  environments  or  in  complex 
laboratory  task  situations.  Generally,  both  error  and  latency  data  are  gathered  for  several  depen¬ 
dent  variables  (DV).  The  assumed  advantage  in  using  multiple  primary  task  measures  is  that 
they  will  provide  greater  sensitivity  to  changes  in  operator  load  by:  (1)  decreasing  measure¬ 
ment  error  via  combined  data  analysis  of  multiple  DVs,  and  (2)  allowing  for  the  assessment  of 
more  than  one  resource  or  skill,  thereby  increasing  the  precision  and  utility  of  the  measure. 
Although  the  selection  of  task  parameters  for  this  methodology  is  not  as  critical  as  for  the  single 
task  methodology,  parameters  to  be  measured  should  again  be  chosen  based  upon  their  poten¬ 
tial  to  be  influenced  by  different  load  manipulations.  O'Donnell  and  Eggemeier  (1986)  caution 
that  this  is  an  important  consideration,  as  data  collected  simply  because  of  availability  may  not 
be  meaningful. 

Multiple  primary  task  measures,  as  with  single  task  measures,  have  produced  mixed  results  as 
to  their  capability  to  distinguish  among  different  levels  of  load.  A  number  of  experiments 
(Dorfman  and  Goldstein,  1975;  Goldstein  and  Dorfman,  1978;  Hicks  and  Wierwille,  1979) 
have  found  multiple  primary  task  measures  sensitive  to  variations  in  load.  Others  (Brecht, 

1977;  Finkelman,  Zeitlin,  Filippi,  and  Friend,  1977)  have  found  that  some  measures  fail  to 
discriminate  variations  in  load  that  were  detected  by  other  assessment  techniques.  Again, 
although  multiple  primary  task  measures  may,  in  certain  instances,  discriminate  overload  from 
nonoverload  situations,  they  generally  do  not  provide  clear  diagnostic  statements  as  to  the 
specific  resources  being  overloaded.  Therefore,  multiple  primary  task  measures,  like  single 
task  primary  measures,  should  be  regarded  as  global  measures  of  operator  load. 


The  principal  utility  of  primary  task  performance  measures  is  in  determining  whether  the  load 
associated  with  a  system  (task,  equipment,  environment)  will  degrade  operator  performance.  In 
such  applications,  where  diagnostic  capability  is  not  required,  either  single  or  multiple  sum¬ 
mary  task  measures  will  provide  adequate  information. 

4. 2. 2. 3. 2.  Secondary  Task  Measures 

Secondary  task  measures  of  operator  load  require  the  concurrent  performance  of  two  tasks  by  the 
operator.  The  task  of  central  interest  is  generally  termed  the  "primary  task,"  while  the  additional 
task  is  termed  the  "secondary  task."  The  estimate  of  operator  load  will  be  obtained  from  the  opera¬ 
tor's  performance  on  the  secondary  task.  Secondary  task  methodology  may  be  used  in  a  variety  of 
applications,  including  the  measurement  of  operator  effort,  attentional  demand,  the  effect  of 
stressors,  and  the  adequacy  of  displays.  It  is  most  often  used  as  a  measure  of  the  spare  or  residual 
capacity  of  the  operator  as  he  performs  the  primary  task  (see  Section  3).  Secondary  task  method¬ 
ologies  have  proven  to  be  both  sensitive  to  differences  in  capacity  expenditure  and  diagnostic  of 
primary  task  demand  (they  are  capable  of  discriminating  some  differences  in  resource  expenditure, 
i.e.,  central  processing  versus  motor  output).  Some  intrusion  (the  degree  to  which  the  secondary 
task  degrades  primary  task  performance)  problems  have  been  reported  (Ogden  et  al.,  1979; 

Williges  and  Wierwille,  1979).  In  attempts  to  alleviate  the  intrusion  problem,  several  techniques 
(e.g.,  embedded  secondary  task,  adaptive  procedures)  have  been  designed.  These  techniques  are 
reviewed  in  O'Donnell  and  Eggemeier  (1986). 

In  most  applications  of  secondary  task  measures,  the  operator  is  instructed  to  maintain  error-free 
performance  on  one  task  at  the  expense  of  the  other.  Depending  upon  the  goal  of  the  experimenter, 
one  of  two  different  methodologies  may  be  used:  the  loading  task  paradigm  or  the  subsidiary  task 
paradigm.  The  loading  task  paradigm  instructs  the  operator  to  maintain  a  certain  level  of  perfor¬ 
mance  on  the  secondary  task,  even  if  decrements  in  primary  task  performance  result.  The  assump¬ 
tion  of  this  paradigm  is  that  the  additional  load  imposed  on  the  operator  by  the  secondary  task  will 
shift  total  operator  load  from  Region  A  to  Region  B  (Figure  8),  inducing  degradations  in  perfor¬ 
mance  of  the  primary  task.  When  levels  of  secondary  task  load  are  equal,  performance  degradation 
will  be  greater  for  difficult  primary  tasks  than  for  easy  primary  tasks.  Degradations  in  primary  task 
performance  that  occur  at  specific  levels  of  secondary  task  load  are  used  as  an  index  of  the 
resulting  cognitive  load  (workload)  associated  with  the  primary  task.  Secondary  task  performance 
is  measured  directly  to  ensure  that  the  specified  criteria  levels  ate  maintained  and  that  the  load 
imposed  by  the  task  is  equated  across  the  various  experimental  conditions. 
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LEVEL  OF  OPERATOR  WORKLOAD 

Figure  8.  Hypothetical  Relationship  Between  Workload  and  Operator  Performance 
(O'Donnell  and  Eggemeier,  1986) 

The  loading  task  paradigm  has  been  used  primarily  to  simulate  the  effects  of  information  pro¬ 
cessing  demands  that  are  absent  from  the  laboratory,  but  are  expected  to  occur  in  the  operational 
environment  Dougherty,  Emery,  and  Curtin  (1964),  for  example,  used  a  loading  task  paradigm  to 
evaluate  two  cockpit  display  options  (conventional  versus  pictoral).  Primary  task  measures  had 
previously  indicated  no  differences  in  the  cognitive  load  imposed  upon  pilots  by  the  two  display  s. 
Addition  of  a  secondary  digit  naming  task,  however,  caused  decrements  in  flight  performance  (the 
primary  task)  under  the  traditional  display  condition.  Since  equivalent  levels  of  secondary  task 
load  did  not  lead  to  performance  decrements  in  the  pictorial  display  condition,  it  can  be  concluded 
that  the  pictorial  display  imposed  less  load  on  the  pilot  than  did  the  conventional  display.  This 
example  illustrates  the  ability  of  secondary  loading  tasks  to  increase  the  operator's  load  in  the 
laboratory,  making  it  more  representative  of  the  operational  environment  and  increasing  the  sensi¬ 
tivity  of  primary  task  measures.  Other  applications  of  the  loading  task  paradigm  have  included  the 
evaluation  of  methods  of  task  performance,  the  evaluation  of  display  configurations,  and  the 
effects  of  stressors  (noise,  heat)  on  primary  task  performance  (Ogden  et  al„  1979;  Rolfe,  1971). 

The  second  secondary  task  paradigm  (i.e.,  the  subsidiary  or  reserve  capacity  task  paradigm)  is 
more  frequently  used.  In  this  paradigm,  the  subject  is  instructed  to  avoid  degraded  primary  task 
performance  at  the  expense  of  the  secondary  task.  The  secondary  task,  rather  than  being  used  to 
load  the  primary  task  (as  in  the  loading  task  paradigm),  is  now  used  to  determine  how  much 
additional  work  the  operator  may  do  while  performing  the  primary  task  at  some  specified  level  (its 
single  task  baseline  level).  The  assumption  of  the  subsidiary  task  paradigm  is  that,  as  the  second 
task  is  added,  decrements  in  that  task  will  result  (again,  as  measured  against  its  single  task  baseline 
level).  These  decrements  will  then  serve  as  a  measure  of  the  reserve  capacity  of  the  operator  when 
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performing  the  primary  task  (Brown,  1964;  Knowles,  1963).  The  subsidiary  task  paradigm  has 
been  used  to  measure  reserve  capacity  for  a  variety  of  purposes,  including  evaluation  of  instru¬ 
ments  and  displays,  operating  conditions  and  procedures,  and  the  effects  of  extended  practice  on 
performance  (Ogden  et  al.,  1979;  Rolfe,  1971;  Williges  and  Wierwille,  1979). 

4.3.  METHODOLOGICAL  CONSIDERATIONS  FOR  SECONDARY  TASK 

TECHNIQUES 

When  using  secondary  task  techniques  to  measure  operator  load,  there  are  a  number  of  method¬ 
ological  guidelines  to  be  considered.  A  number  of  general  guidelines  for  the  use  of  secondary  task 
techniques  have  already  been  discussed  in  previous  sections.  These  guidelines  are  shown  again  in 
Table  2.  A  thorough  review  of  these  guidelines  including  specific  techniques  to  minimize  primary 
task  intrusion,  techniques  to  ensure  secondary  task  sensitivity,  interpretation  of  single-to-dual  task 
performance  decrements,  and  the  most  frequently  used  types  of  secondary  tasks  can  be  found  in 
O'Donnell  and  Eggemeier  (1986). 


TABLE  2.  METHODOLOGICAL  GUIDELINES  FOR  APPLICATIONS  OF  SECONDARY 
TASK  METHODOLOGY  (O’Donnell  and  Eggemeier,  1986) 


1.  In  the  loading  task  paradigm,  subjects  should  be  instructed  to  maintain  secondary  task  perfor¬ 
mance  at  single-task  baselines  under  concurrent  task  conditions. 

2.  In  the  subsidiary  task  paradigm,  subjects  should  be  instructed  that  primary  task  performance 
should  be  maintained  at  single-task  baseline  levels  under  concurrent  task  conditions. 

3.  In  both  paradigms,  baseline  measures  of  single-task  performance  on  both  the  primary  and 
secondary  tasks  should  be  taken.  In  the  loading  task  paradigm,  primary  task  baselines  are 
required  to  assess  differences  in  primary  task  performance  that  might  occur  under  concurrent 
task  conditions.  Secondary  task  baselines  are  required  to  ensure  that  the  secondary  task  is 
performed  to  the  criterion  set  by  the  experimenter.  In  the  subsidiary  task  paradigm,  primary 
task  baseline  performance  is  required  to  evaluate  any  intrusion  effects  that  might  occur.  Base¬ 
line  secondary  task  measures  are  required  to  evaluate  properly  the  degree  of  single  to  dual  task 
decrements  which  might  occur. 

4.  In  both  paradigms,  employ  several  levels  of  secondary  task  difficulty.  Higher  levels  of  sec¬ 
ondary  task  difficulty  may  distinguish  differences  in  workload  between  design  options  or  tasks 
that  are  not  distinguished  by  lower  levels  of  secondary  task  difficulty.  The  theoretical  basis  for 
such  difficulty  effects  is  that  lower  levels  of  secondary  task  difficulty  may  not  be  sufficient  to 
shift  total  workload  from  Region  A  to  B  (Figure  8),  whereas  more  difficult  levels  may  do  so. 

5.  In  the  subsidiary  task  paradigm,  consider  the  use  of  various  techniques  that  have  been  pro¬ 
posed  to  reduce  or  eleminate  primary  task  intrusion.  Two  major  classes  of  these  techniques 
include  adaptive  secondary  methodology  and  embedded  secondary  tasks. 

6.  In  both  paradigms,  attempt  to  ensure  maximum  secondary  task  sensitivity  through  choice  of  an 
appropriate  task  and  through  use  of  sufficient  practice  to  achieve  stable  performance  on  the 
secondary  task  prior  to  its  use. 
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Section  5 

EXISTING  U.S.  AIR  FORCE  WORKLOAD  BATTERIES 


5.1.  BACKGROUND 

Workload  batteries  are  collections  of  a  number  of  experimental  tasks  which  can  be  used  to  investi¬ 
gate  a  variety  of  research  issues  or  questions  concerning  human  performance  and  workload.  Each 
individual  battery  task  may  be  used  alone  or  in  conjunction  with  the  other  tasks  in  the  battery.  The 
utility  of  a  workload  battery  is  that  it  can  be  used  to  provide  both  global  and  diagnostic  informa¬ 
tion.  By  using  a  specified  task  or  set  of  tasks  in  the  battery,  the  researcher  may  receive  a  global 
assessment  of  a  particular  situation  or  a  general  answer  to  a  specific  research  question  (i.e.,  "Is 
there  a  significant  amount  of  workload  associated  with  this  system?").  The  Unified  Tri-Services 
Cognitive  Performance  Assessment  Battery  (UTC-PAB),  for  example,  specifies  a  set  of  five  tasks 
(each  task  representative  of  one  of  the  major  human  information  processing  functions)  to  be  used 
for  initial  global  screening.  Based  upon  the  results  of  the  initial  screening,  other  tasks  are  specified 
by  UTC-PAB  for  use  in  further,  more  diagnostic  investigation. 

In  addition  to  the  various  types  of  information  which  can  be  obtained  from  workload  batteries, 
these  batteries  have  a  number  of  methodological  advantages.  Existing  batteries  contain  well 
documented  experimenter  instructions,  subject  instructions,  and  guidelines  for  use  of  the  various 
tasks  and  task  sets.  Some  existing  batteries  have  been  implemented  in  user-friendly  microcom¬ 
puter  software.  This  software  aids  in  both  data  collection  and  analysis.  Finally,  existing  batteries 
contain  documentation  of  the  sensitivity  and  reliability  of  their  various  tasks. 

Since  at  least  1980,  AAMRL  has  been  developing  and  collecting  different  tasks  for  compilation 
into  test  batteries  (Eggemeier,  1981).  Two  major  test  batteries  that  have  been  developed  are  the 
Criterion  Task  Set  (CTS)  by  Shingledecker  (1984),  and  the  UTC-PAB  (Perez,  Masline,  Ramsey, 
and  Urban,  1987).  The  CTS  was  designed  to  place  selective  demands  on  the  basic  mental 
resources  and  information  processing  functions  of  the  subjects.  The  UTC-PAB  was  designed  to 
evaluate  cognitive  performances  of  test  subjects. 

5.1.1.  Criterion  Task  Set 

The  CTS  was  developed  as  a  research  tool  for  applied  experimentation  of  human  performance 
capabilities.  The  test  battery  is  made  up  of  nine  standardized  tasks.  Eight  of  the  nine  tests  can  be 
presented  with  three  different  levels  of  difficulty.  The  single  level  test  is  a  finger  tapping  test, 
designated  as  the  Interval  Production  Task.  All  of  the  tasks  are  on  the  following  list: 
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1 .  Probability  Monitoring 

2.  Continuous  Recall 

3.  Memory  Search 

4.  Linguistic  Processing 

5.  Mathematical  Processing 

6.  Spatial  Processing 

7.  Grammatical  Reasoning 

8.  Unstable  Tracking 

9.  Interval  Production 

All  of  these  tests  are  implemented  in  user-friendly  software  on  an  inexpensive  microcomputer 
system.  A  user's  guide  has  been  developed  to  provide  information  on:  (1)  system  hardware,  (2) 
system  assembly,  (3)  data  collection,  and  (4)  data  analysis  (Acton  and  Crabtree,  1985). 

5.1.2.  Unified  Tri-Services  Cognitive  Performance  Assessment  Battery 

The  UTC-PAB  was  developed  to  evaluate  cognitive  performance  of  subjects  for  a  chemical  defense 
biomedical  drug  screening  program.  The  tests  were  selected  by  the  Tri-Service  Joint  Working 
Group  on  Drug  Dependent  Degradation  of  Military  Performance  (JWGD3  MILPERF).  A  report  by 
England,  Reeves,  Shingledecker,  Thome,  Wilson,  and  Hegge  (cited  in  Perez  et  al.,  1987)  details 
the  history  and  selection  criteria  for  the  UTC-PAB.  The  test  battery  consists  of  25  tests  that 
evaluate  six  different  cognitive  processes.  These  cognitive  areas  and  their  associated  tests  are  listed 
below: 

1 .  PERCEPTUAL  INPUT,  DETECTION,  AND  IDENTIFICATION 

Visual  Scanning  Task 

Visual  Probability  Monitoring  Task 

Pattern  Comparison  (Simultaneous) 

Four-Choice  Serial  Reaction  Time 

2.  CENTRAL  PROCESSING 

Auditory  Memory  Search  (Memory  Search  Tasks) 

Continuous  Recognition  Task 
Code  Substitution  Task 

Visual  Memory  Search  (Memory  Search  Tasks) 

Item  Order  Test 
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3.  INFORMATION  INTEGRATION/MANIPULATION-LINGUISTIC/SYMBOLIC 

Linguistic  Processing  Task 
Two-Column  Addition 
Grammatical  Reasoning  (Symbolic) 

Mathematical  Processing  Task 
Grammatical  Reasoning  (Traditional) 

4.  INFORMATION  INTEGRATION/MANIPULATION-SPATIAL  MODE 

Spatial  Processing  Task 
Matching  to  Sample 
Time  Wall 

Matrix  Rotation  Task  (Spatial  Processing  Task) 

Manikin  Test 

Pattern  Comparison  (Successive) 

5.  OUTPUT/RESPONSE  EXECUTION 

Interval  Production  Task 
Unstable  Tracking  Task 

6.  SELECTIVE/DIVIDED  ATTENTION 

Dichotic  Listening  Task 

Memory  Search/Unstable  Tracking  Combination  (Stemberg-Tracking  Combination) 

S  troop  Test 

A  full  description  of  the  purpose,  history,  and  utilizational  instructions  for  each  of  these  tests  are 
reported  by  Perez  et  al.  (1987).  Like  the  CTS  test  battery,  the  UTC-PAB  is  also  implemented  on 
user-friendly  software  capable  of  running  on  an  inexpensive  microcomputer  system. 
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Section  6 

COMMUNICATION  THEORY 


Communication  or  information  theory  is  a  mathematical  attempt  to  define  the  limitations  of  a 
specific  communication  system  or  process.  The  process  of  measuring  communication  can  be 
divided  into  three  subproblems  (Weaver,  1949/1964);  a  "technical"  problem,  a  "sematic”  problem, 
and  an  "effectiveness"  problem. 

6.1.  THE  TECHNICAL  PROBLEM 

The  technical  problem  of  communication  measurement  is  determining  how  accurately  a  set  of 
symbols  (i.e.,  written  speec.O  or  a  signal  (i.e.,  radio  transmission  of  voice)  is  transferred  from  a 
sender  to  a  receiver.  Figure  9  represents  the  communication  process  at  the  technical  level. 


NOISE 

SOURCE 


Figure  9.  The  Elements  of  a  Communication  System  (Shannon  and  Weaver,  1949/1964) 

This  process  may  be  described  as  follows.  The  information  source  (i.e.,  the  sender)  selects  the 
desired  message  from  a  set  of  possible  messages.  This  message  is  then  converted  by  the  trans¬ 
mitter  into  the  signal.  The  signal  is  sent  over  the  communication  channel  to  the  receiver.  Finally, 
the  receiver  changes  the  transmitted  signal  back  into  a  message  and  delivers  it  to  the  destination. 

Shannon  (1948/1964)  developed  a  mathematical  theory  which  describes  communication  at  the 
technical  level.  Shannon's  theory  has  been  used  to  address  a  number  of  problems  concerning 
communication  systems,  including:  how  to  measure  the  amount  of  information  within  a  system, 
how  to  measure  the  capacity  of  a  communication  channel,  how  to  determine  the  characteristics  of 
an  efficient  coding  process,  how  noise  affects  the  accuracy  of  receiving  a  message,  how  to 
minimize  the  undesirable  effects  of  noise,  and  how  continuous  and  discrete  signals  differently 
affect  a  communication  system. 


45 


6.1.1. 


In  communication  theory,  the  term  "information"  does  not  generally  refer  to  its  more  ordinary 
definitions  of  "meaning,"  "knowledge,"  "news,"  etc.  "Information"  in  this  situation  generally 
refers  to  the  statistical  rarity  of  a  source  of  message  symbols.  Shannon  (1948/1964)  defined 
information  as  a  "a  measure  of  one's  freedom  of  choice  when  one  selects  a  message."  In  this 
sense,  information  describes  not  the  content  of  individual  messages  themselves,  but  the  amount  of 
choice  an  individual  has  in  selecting  any  particular  message  from  the  total  message  set.  In  other 
words,  information  is  defined  by  the  uncertainty  of  events-less  certain  events  having  a  greater 
amount  of  information  associated  with  them.  Mathematically,  the  "self-information"  of  an  event 
before  or  after  transmission  (given  that  the  output  is  independent  of  the  input)  can  be  described  as 
(Systems  Research  Laboratories  [SRL],  1987): 

(A  =  ak)  =  I(ak)  =  Iog2  [  1/P  (ak)]  bits 


where: 

A  =  (ai,  az, ...  a^,  the  set  of  inputs  or  the  source  alphabet/vocabulary. 

Kafc)  =  the  self- in  formation  of  the  event  that  A  =  ak,  or  the  information  needed  to  make  the 
occurrence  of  event  ak  certain. 

P(afc)  =  the  probability  that  A  was  transmitted. 

The  selection  of  a  message  in  a  communication  system  can  occur  in  a  number  of  ways.  In  the 
simplest  case  (as  described  above),  the  information  source  is  free  to  choose  only  between  a  few 
predetermined  or  "canned"  messages.  More  commonly,  however,  the  information  source  con¬ 
structs  each  message  individually,  by  making  a  sequence  of  choices  from  some  set  of  symbols. 
An  everyday  example  is  choosing  one  word  after  another  to  form  a  sentence. 

According  to  Weaver  (1949/1964),  the  consideration  of  statistical  probabilities  becomes  important 
for  the  measurement  of  communication  because  probabilities  reflect  the  rules  by  which  a  message 
is  formed.  As  each  successive  symbol  from  the  vocabulary  set  is  chosen,  the  probabilities  of 
selecting  the  remaining  symbols  change.  In  other  words,  at  any  stage  in  the  communication  pro¬ 
cess,  the  probability  of  selecting  any  symbol  is  determined  by  the  preceding  choices.  In  English 
speech,  for  example,  if  the  last  selected  symbol  is  "the,"  the  probability  that  the  next  symbol  is  an 
article  is  very  small,  while  the  probability  that  it  is  either  an  adjective  or  a  noun  is  very  great. 
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Systems  in  which  sequences  of  symbols  are  chosen  according  to  probabilities  are  called  "stochastic 
processes."  Stochastic  systems  where  probabilities  depend  directly  upon  the  previous  events  are 
called  "Markoff  processes." 


The  ability  of  a  communication  channel  to  transmit  information  is  described  as  its  "capacity." 
Generally,  capacity  is  defined  as  the  amount  of  information  transmitted  per  second,  measured  in 
bits  per  second. 

6.1.2.  Entropy 

The  entropy  of  a  source  alphabet  or  vocabulary  is  a  measure  of  the  randomness  or  uncertainty 
about  that  alphabet.  Entropy  can  also  be  thought  of  as  the  average  amount  of  information  per 
source  symbol.  Mathematically,  entropy  is  defined  as  (SRL,  1987): 

K 

H[l(ak)]  =  X  p  (ak)  log2  [1/P  (ak)]  bits 

k=  1 


where: 

H  [I(ak)]  =  the  average  amount  of  information  per  source  symbol. 

6.1.3.  Mutual  Information 

"Mutual  information"  describes  the  uncertainty  in  some  symbol  or  vocabulary  item  (i.e.,  aG  that  is 
resolved  in  the  output  of  the  system  (i.e.,  bj).  The  previously  described  measures  of  information 
have  considered  the  output  of  a  system  independent  of  the  input  (i.e.,  information  before  or  after 
transmission).  However,  in  a  real  system,  the  output  is  dependent  upon  the  input.  The  self¬ 
information  of  the  event  A  =  ak,  given  that  event  B  =  bj  has  occurred,  can  be  described  as  (SRL, 
1987): 


I  (ajt/bj)  =  log2  [1/P  (ak/bj)]  =  -log2  [P(ak/bj)] 

This  describes  the  amount  of  information  that  must  be  supplied  to  an  observer  to  specify  that 
A  =  ak  after  an  observer  has  received  B  =  bj  or,  in  other  words,  the  amount  of  information  that  was 
lost  during  transmission.  The  difference  between  this  quantity  [i.e,  I(ak/bj)]  and  the  self- 
information  of  the  event  that  A  =  ak  [i.e.,  RajJ]  is  the  mutual  information.  This  is  a  measure  of  the 
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gain  in  information  due  to  the  reception  of  a  symbol  (bj).  Mathematically,  mutual  information  is 
defined  as  (SRL,  1987): 


I  (ak;  bj)  =  I  (ak)  - 1  (a/bj)  =  log2  [  P(ak/bj)/P  (ak)  ] 

When  the  mutual  information  is  averaged  across  the  input  alphabet  or  vocabulary,  the  "channel”  or 
"average  mutual  information"  (AMI)  is  obtained.  This  is  a  measure  of  the  information  gain  of  an 
entire  system,  not  dependent  on  individual  input  and  output  symbols,  but  dependent  on  the  symbol 
frequencies.  The  AMI  is  represented  as  (SRL,  1987): 

K  J 

I  (A;  B)  =  X  I  P(ak,bj)  log2  [P(ak,bj)/P(ak)  P(bj)] 

k  = 1  j=  1 


6.1.4.  Information  Theory  for  Assessing  Operator  Performance 

According  to  Wickens  (1984a),  a  large  amount  of  human  performance  theory  is  specifically  con¬ 
cerned  with  the  problem  of  transmitting  information.  In  any  situation  where  an  operator  is  per¬ 
ceiving  stimuli  and  somehow  responding  to  that  stimuli,  the  operator  can  be  described  as  both 
encoding  and  transmitting  information.  As  an  example,  Wickens  ( 1984a)  describes  an  aircraft  pilot 
as  someone  who  "must  process  a  multitude  of  visual  signals  bearing  on  the  status  of  the  aircraft 
while  listening  to  auditory  messages  from  air  traffic  control  concerning  flight  plans  and  the  status 
of  other  aircraft."  Information  theory  provides  a  metric  that  enables  these  information  processes  to 
be  quantified  and  described  in  ways  that  allow  the  many  tasks  of  the  aircraft  pilot  (or  other  human 
operators)  to  be  compared.  When  information  theory  is  used  in  this  way,  it  is  assumed  that  infor¬ 
mation  processing  efficiency  can  be  associated  with  the  amount  of  information  an  operator  can 
process  per  unit  time  (i.e.,  channel  capacity),  and  that  task  difficulty  can  be  associated  with  the  rate 
of  information  presentation  (Wickens,  1984a). 

Information  theory  has  been  a  great  asset  to  researchers  investigating  both  communication  pro¬ 
cesses  and  operator  performance.  Wickens  (1984a)  states  that  information  theory  provides  an 
essentially  dimensionless  unit  of  performance  across  a  wide  variety  of  different  dependent  varia¬ 
bles.  Fitts  and  Posner  (1967)  have  also  suggested  that  certain  limits  of  the  human  information 
processing  system  remain  relatively  invariant  when  described  in  the  terms  of  information  theory. 
Despite  these  successes,  however,  the  use  of  information  theory  in  human  performance  research/ 
applications  has  received  some  criticism.  Among  these  criticisms  are  limitations  in  the  sensitivity 
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of  the  information  metric,  and  the  inability  of  information  measures  to  describe  the  factors 
influencing  reaction  time  (RT).  Wickens  (1984a)  offers  a  complete  discussion  of  these  criticisms. 
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Section  7 

VOICE  COMMUNICATION  EFFECTIVENESS:  PROPOSED  TASKS 


This  section  of  this  report  describes  the  performance  tasks  selected  for  implementation  in  the 
PACRAT  test  facility.  Section  7.1  describes  the  communications  scenario  that  has  been  devel¬ 
oped.  Section  7.2  describes  the  secondary  task  selected  to  be  performed  concurrently  with  the  com¬ 
munication  scenario.  Finally,  Section  7.3  describes  an  alternate  scenario  configuration  which 
could  be  further  developed  for  use  in  the  PACRAT  facility. 

7.1.  COMMUNICATIONS  SCENARIO 

A  communication  scenario  was  developed  by  SRL  for  implementation  in  the  PACRAT  test  facility. 
The  scenario  was  constructed  based  upon  the  findings  of  the  literature  review  and  in  accordance 
with  a  number  of  predetermined  constraints  (Figure  10).  These  constraints  were  determined  after 
thorough  consideration  of  both  the  characteristics  of  the  PACRAT  facility  and  AAMRL/BBA's 
research  interests  and  requirements.  The  communication  scenario  is  a  30-minute  sequence  of 
short,  operationally  realistic  sentences  which  are  verbally  presented  and  require  a  series  of  manual 
responses  by  the  subject  (Figure  11).  It  models  a  two-way,  interactive,  time  dependent  communi¬ 
cation  situation.  Each  message  is  a  separate,  complete  sentence,  two  to  six  words  in  length. 

7.1.1.  Selection  of  Vocabulary  and  Development  of  Message  Sentences 

The  vocabulary  used  in  the  scenario  was  chosen  from  a  combination  of  sets  of  confusable  words 
previously  developed  by  AAMRL/BBA  and  transcriptions  of  actual  Air  Force  pilot  communica¬ 
tions.  The  confusable  words  were  derived  from  2,000  hours  of  Air  Force  in-flight  communication 
in  an  attempt  to  develop  a  standardized  word  intelligibility  test  using  flight  jargon  words.  Ten  lists 
of  confusable  words  (Table  3)  were  selected  for  use.  Each  list  consisted  of  50  sets  of  four 
"confusable"  words.  The  confusability  of  each  word  set  had  been  previously  determined  by  their 
acoustic  and  phonetic  similarity,  and  by  data  collected  from  their  experimental  use.  The  sets 
resulted  in  a  database  (database  A)  of  approximately  1900  words  (some  words  appeared  on  more 
than  one  list  or  in  more  than  one  confusable  set).  This  database  was  then  combined  with  a  data¬ 
base  of  approximately  2100  words  (database  B)  selected  from  transcriptions  of  pilot  communica¬ 
tions  during  various  flight  situations.  These  transcriptions  had  been  previously  analyzed  for 
entropy  and  mutual  information  (see  Section  6).  All  words  which  were  common  to  both  databases 
(database  A  and  database  B)  were  combined  to  form  a  third  database  (database  C).  This  database 
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All  Information  Presented  Over  the  Communication  Channel 
Words  Used  in  Messages  Chosen  from  List  of  Confusable  Words 
May  Be  Used  Alone  or  with  Another  Task 
30-Minute  Duration 
Minimal  Training  Requirements 

Aircraft  Oriented,  but  Understandable  by  Nonoperational  Subjects 
Low  Level  Two-Way  Interaction 
Easily  Modifiable 
Structured  Script 

Cost  Associated  with  Message  Repeats 
Cost  Associated  with  Wrong  Decision 
Forced  to  Make  a  Decision 
Time  Constrained 

Get  Through  Message  "Loop"  Quickly 


Figure  10.  Scenario  Constraints 


30  MINUTE  SCENARIO 


EXPERIMENTER 


T 


MESSAGE 


rrm 


30  MINUTES 


MESSAGE 


OPERATOR 


•  PERCEIVES/ENCOOES 

•  INTERPRETS 

•  RESPONDS 


Figure  11.  VCE  Communication  Scenario 


TABLE  3.  LIST  OF  CONFUSABLE  WORDS 


1. 

Marked 

Marsh 

Marks 

Mark 

2. 

Blast 

Fast 

Past 

Last 

3. 

Cone 

Code 

Cove 

Cold 

4. 

Reached 

Reach 

Reef 

Reads 

5. 

Scan 

Can 

Span 

Plan 

6. 

Seemed 

Seals 

Seems 

Seized 

7. 

We 

Free 

Be 

See 

8. 

Mapped 

Match 

Map 

Matched 

9. 

Parts 

Park 

Parked 

Part 

10. 

Real 

She'll 

Wheel 

Feel 

11. 

Juts 

Jump 

Judge 

Just 

12. 

Fire 

Prior 

Wire 

Tire 

13. 

Great 

Straight 

Gate 

State 

14. 

Thank 

Bank 

Rank 

Yank 

15. 

Tight 

Tied 

Type 

Timed 

16. 

Fake 

Face 

Failed 

Phased 

17. 

Done 

One 

Gun 

Ton 

18. 

Hack 

Pack 

Shack 

Fac 

19. 

White 

Right 

Bright 

Light 

20. 

Show 

Though 

So 

Row 

21. 

Head 

Held 

Helps 

Help 

22. 

Lose 

Loose 

Loop 

Looped 

23. 

And 

Add 

Ask 

Am 

24. 

She’s 

She'd 

She 

She’ll 

25. 

Flap 

Scrap 

Cap 

Slap 

26. 

Not 

Dot 

Shot 

Hot 

27. 

With 

Wing 

Will 

Width 

28. 

Did 

Grid 

Mid 

Hid 

29. 

Six 

Sixth 

Sick 

Sit 

30. 

Plate 

Placed 

Plane 

Place 

31. 

Tripped 

Trims 

Trimmed 

Trim 

32. 

Loud 

Plowed 

Cloud 

How’d 

33. 

Than 

Man 

Can 

Fan 

34. 

Or 

For 

Poor 

Door 

35. 

Word 

Heard 

Bird 

Third 

36. 

Old 

Cold 

Told 

Hold 

37. 

West 

Went 

Wet 

Well 

38. 

Be 

Beam 

Beached 

Beach 

39. 

Notes 

No 

Note 

Nose 

40. 

Wash 

Washed 

Watch 

Watched 

41. 

Dust 

Duck 

Ducked 

Ducks 

42. 

Heat 

He’d 

He's 

Heats 

43. 

Notch 

Knot 

Knots 

Notched 

44. 

Flown 

Zone 

Bone 

Tone 

45. 

It 

Its 

Is 

If 

46. 

Pick 

Chick 

Click 

Quick 

47. 

Weak 

Sneak 

Seek 

Peak 

48. 

Glide 

Side 

Slide 

Guide 

49. 

Fifth 

Fix 

Fixed 

Fits 

50. 

Tough 

Buff 

Rough 

Stuff 
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consisted  of  approximately  1200  words.  From  this  database,  all  words  which  had  occurred  as 
primary  words  (the  first  word  in  a  set  of  two)  in  database  B  and  had  secondary  words  (the  second 
word  in  a  set  of  two)  existing  in  database  C  were  combined  into  a  fourth  and  final  database  (data 
base  D).  This  database  of  496  confusable  words  was  used  as  the  vocabulary  set  for  the  scenario. 
Figure  12  illustrates  the  combination  of  the  various  databases  into  the  final  vocabulary. 


DATABASE  B 


DATABASE  C 


Figure  12.  Development  of  the  Scenario  Vocabulary 


The  terms  "primary  word"  and  "secondary  word"  have  been  used  by  AAMRL/BBA  in  research 
investigations  which  have  included  analysis  of  the  information  content  of  specific  messages  and 
vocabularies.  These  terms  are  simply  used  to  indicate  the  position  each  word  has  or  could  have 
within  a  sentence.  A  primary  word  is  defined  as  the  first  word  in  a  set  of  two  words.  Each  word 
in  a  given  vocabulary  can  be  described  as  a  primary  word  (with  the  exception  of  any  word  which 
only  occurs  as  the  last  word  in  a  sentence).  Each  primary  word  will  have  a  set  of  secondary  words 
associated  with  it.  A  secondary  word  is  defined  as  the  second  word  in  a  set  of  two  words.  The  set 
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of  secondary  words  following  a  given  primary  word  will,  therefore,  consist  of  any  words  which 
could  (based  upon  the  linguistic  structure  of  the  vocabulary)  immediately  follow  that  primary 
word.  For  example,  consider  the  following  sentences:  "Henri  Matisse  was  a  great  painter,"  "Jane 
Austin  was  a  great  novelist,"  "Robert  Frost  was  a  great  poet."  When  the  word  "great"  is  evaluated 
as  a  primary  word,  the  words  "painter,"  "novelist,"  and  "poet"  comprise  the  set  of  secondary 
words  associated  with  it. 


The  actual  scenario  messages  (sentences)  were  generated  by  a  computer  program  which  utilized  the 
selected  scenario  vocabulary  (database  D).  This  program  generated  all  possible  two  to  six  word 
sentences  which  followed  the  linguistic  rules  defined  by  database  A  (i.e.,  all  sentences  modeled  the 
natural  linguistic  structure  of  the  actual  pilot  communications).  All  messages  were  then  checked 
for  semantic  meaningfulness.  Any  messages  not  meeting  this  criterion  were  deleted  from  the  set  of 
possible  scenario  messages.  Messages  meeting  this  criterion  were  randomly  combined  to  form  the 
30-minute  scenario  of  two  to  six  word  sentences.  Examples  of  the  actual  scenario  messages  are: 


Two  word  message: 
Three  word  message 
Four  word  message: 
Five  word  message: 
Six  word  message: 


"Turn  base." 

"My  gate  nine." 

"Change  in  left  turn." 

"Winds  still  at  three  eight." 
"Nine  hold  wait  for  flight  three. 


ti 


7.1.2.  Construction  of  the  Scenario 


The  communications  scenario  has  been  designed  to  be  a  sequence  of  40  to  160  messages  for  each 
30-minute  time  period.  Each  message  will  be  one  of  the  two-  to  six-word  sentences  described  in 
Section  7.1.1.  A  two- word  call  sign,  individual  to  each  subject,  and  presented  to  each  subject 
prior  to  the  start  of  the  scenario,  precedes  each  message.  Each  call  sign  presentation  occurs  in  a 
carrier  phrase;  for  example,  "Alpha-One,  acknowledge."  Immediately  after  the  subject's  response 
to  the  call  sign,  the  scenario  message  is  presented.  A  3-  to  5-second  break  will  occur  between  the 
end  of  the  subject's  response  to  a  message  and  the  presentation  of  a  new  message.  Each  message 
occurs  within  the  framework  of  a  "timeout  period."  The  timeout  period  will  be  the  length  of  time 
occurring  between  the  end  of  the  message  presentation  and  the  time  at  which  the  scenario  vocabu¬ 
lary  disappears  from  the  display.  In  other  words,  the  timeout  period  is  the  length  of  time  the  sub¬ 
ject  has  to  respond  to  the  message.  The  timeout  period  will  be  visually  indicated  to  the  subject  by  a 
time  clock  appearing  in  the  upper  right  comer  of  the  CRT  display.  The  clock  will  count  down  by 
seconds  as  the  time  for  message  response  decreases.  When  the  timeout  period  is  over,  the  scenario 
vocabulary  will  disappear  from  the  display,  indicating  to  the  subject  that  his  allotted  time  for 
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response  is  over  and  the  time  clock  will  become  blank.  Table  4  displays  the  timeout  periods 
associated  with  the  various  message  lengths. 


TABLE  4.  TIMEOUT  PERIODS  FOR  VARIOUS  MESSAGE  LENGTHS 


Message  Length 

Timeout  Period 

2  words 

10  seconds 

3  words 

15  seconds 

4  words 

20  seconds 

5  words 

25  seconds 

6  words 

30  seconds 

7.1.3.  Scenario  Presentation 

Each  message  in  the  scenario  will  be  presented  verbally  to  the  subject  over  one  of  five  PACRAT 
test  facility  intercom  channels  (addresses).  The  subject  will  hear  the  message  over  a  set  of  head¬ 
phones.  The  message  will  be  a  complete  sentence,  two  to  six  words  in  length,  which  could 
actually  occur  in  an  operational,  flight  scenario.  A  time  line  depicting  the  message  presentation  is 
shown  in  Figure  13.  As  soon  as  the  message  has  been  presented,  a  number  of  words  will  be  dis¬ 
played  on  the  three  small  CRT  screens  of  the  subject's  test  station.  If,  for  example,  a  six  word 
message  was  presented,  24  words  in  six  columns  of  four  words  each  would  appear.  Two  columns 
of  words  will  appear  on  each  screen.  These  words  will  be  a  partial  set  of  the  entire  scenario  vocab¬ 
ulary  (database  D).  All  words  in  any  given  column  will  belong  to  the  same  "family"  of  confusable 
words  (see  Section  7.1.1  and  Table  3).  Each  word  which  occurred  in  the  message  will  appear  in 
one  of  the  columns.  The  order  in  which  the  words  are  displayed  will  be  random.  The  subject's 
task  is  to  manually  select  from  the  CRT  screen,  within  an  allotted  time  period  (see  Table  4),  each  of 
the  words  which  occurred  in  the  message.  Each  word  must  be  selected  in  the  order  that  it  was  pre¬ 
sented  in  the  message.  Selections  are  made  by  pressing  the  pushbuttons  to  the  left  or  right  of  each 
CRT  screen.  Figure  14  summarizes  the  experimenter/subject  activities  during  a  scenario  message 
presentation.  To  select  words  from  columns  one,  three,  and  five,  the  buttons  directly  to  the  left  of 
each  word  will  be  used.  To  select  words  from  columns  two,  four,  and  six,  the  buttons  directly  to 
the  right  of  each  word  will  be  used.  The  pushbuttons  appearing  along  the  top  and  bottom  edges  of 
each  display  will  not  be  used  for  word  selections,  and  along  with  any  unused  pushbuttons  on  the 
six  side  columns  (i.e.,  not  adjacent  to  a  word)  will  be  available  for  other  functions  when  needed 
(e.g.,  "CLEAR,"  "ENTER,"  or  "RULES").  As  each  word  is  selected,  it  is  highlighted  and 
moved,  along  with  its  entire  column,  to  the  column  corresponding  to  the  order  in  which  it  was 
selected.  The  column  of  words  previously  holding  that  position  then  moves  to  the  position  just 
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Figure  13.  Time  Line  Depicting  the  Presentation  of  the  Scenario  Messages 


Time  (Seconds) 
0 


10 


20-40* 


Experimenter  (E) 


Transmits  Call  Sign  Message 


Transmits  Communication  Message 


Retransmits  Message 


Sub.iect  (S) 


Responds  to  Call  Sign 


Determines  Message  Content 

Request  Repeats  as  Required 

Performs  Message  Selection 
of  Repeats,  Time  Out) 


Scenario/Computer 

Generates  Call  Sign  Message 
to  E 


Generates  Word  Groups  on 
S's  CRT 

Generates  Communication 
Message  on  E's  CRT 


Collects  Data  (Response  Time, 
Errors,  Number) 


*Ttme  allowed  after  message  transmisison  depends  on  message  length. 


Figure  14.  Experimenter/Subject  Activities  During  Scenario  Message  Presentation 
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vacated  by  the  column  containing  the  selected  word.  The  word  selected  and  subsequent  screen 
change  is  shown  in  Figures  15  and  16.  For  example,  the  message  presented  might  be  "Nine  hold 
wait  for  flight  three."  The  word  "nine"  appears  in  the  third  column  of  words  on  the  CRT  displays 
along  with  the  words  "none,"  "night,"  and  "nice."  The  First  column  of  words  includes  "wait," 
"eight,"  "late,"  and  "rate."  As  "nine"  is  selected,  it  moves  along  with  the  other  words  in  its  column 
to  column  one.  The  words  initially  in  column  one  ("wait,"  "eight,"  "late,”  and  "rate")  then  move  to 
column  three. 

If  the  subject  believes  an  error  has  been  made,  either  in  the  word  selected  or  in  the  order  of  the 
selection,  the  subject  may  reselect  the  entire  sentence  (provided  the  timeout  period  for  that  particu¬ 
lar  message  has  not  expired).  To  reselect  a  sentence  the  subject  must  press  the  button  (one  of  the 
available  pushbuttons)  labeled  "CLEAR,"  and  reenter  the  choice.  The  subject  may  also  at  any  time 
ask  for  the  message  to  be  repeated.  When  the  subject  believes  the  message  has  been  correctly 
selected,  the  button  labeled  "ENTER"  will  then  be  pressed.  This  will  input  the  data  for  that  trial 
(correct  or  incorrect  response,  number  of  repeats,  timeout  expired,  etc.)  into  the  computer.  As 
soon  as  a  subject  has  selected  the  "ENTER"  function,  or  the  timeout  period  has  expired,  a  short  (3 
to  5  seconds)  break  will  occur.  The  CRT  displays  will  be  blank  during  this  time.  At  the  end  of  the 
break  a  new  message  will  be  presented. 

7 . 1 . 3 . 1 .  Linguistic  Rules  as  a  Communication  Aid 

As  described  in  Section  7.1.1,  the  vocabulary  of  the  scenario  was  developed  to  pattern  the  true 
linguistic  structure  of  Air  Force  pilot  communication.  Like  the  English  language,  underlying  rules 
and  relationships,  both  syntactical  and  semantic,  determine  the  structure  of  pilot  communication. 
These  rules  may  or  may  not  be  absolute.  Similarly,  these  rules  may  be  consciously  or  uncon¬ 
sciously  known  to  the  pilot.  An  example  from  the  English  language  might  be  the  knowledge  of 
English  speakers  that  an  article  would  not  be  followed  by  a  verb.  "The"  would  not  be  followed  by 
"ran.”  Adjectives,  however,  often  follow  articles.  The  words  "big"  and  "bird"  might  often  follow 
"the."  These  linguistic  rules  help  speakers  of  English  to  structure  their  speech.  The  English 
language  is,  however,  made  up  of  a  vast  vocabulary  and  many,  many  rules.  For  this  reason, 
determining  the  probability  of  occurrence  of  a  given  word  is  often  difficult.  Situational  and  lexical 
context  often  serve  as  cues,  but,  again  due  to  the  large  size  of  the  English  vocabulary,  this  still  may 
be  a  difficult  task. 

Fighter  pilots,  due  to  their  much  smaller  operational  vocabulary,  and  its  more  strictly  limited  situa¬ 
tional  and  lexical  contexts  of  use,  may  more  directly  use  linguistic  rules  as  communication  aids. 

For  this  reason,  the  nonoperational  subjects  in  this  research  effort  will  be  provided  with  the 
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Figure  15.  Display  Screen  as  the  Subject  First  Selects  the  Word  "Nine" 
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linguistic  rules  that  structure  the  messages  of  the  communication  scenario.  Although  this,  of 
course,  will  not  completely  mimic  the  knowledge  and  skill  that  actual  pilot  subjects  will  have,  it 
should  help  to  provide  a  more  accurate  representation  of  that  knowledge. 

Again,  the  rules  provided  in  the  scenario  model  the  linguistic  rules  found  in  actual  pilot  communica¬ 
tion  (see  Section  7.1.1).  These  rules  may  be  accessed  by  the  subjects  at  any  time  during  a  message 
presentation.  To  access  the  rules  for  a  particular  message,  the  subject  must  first  make  a  word  selec¬ 
tion.  At  that  time,  the  subject  may  receive  a  listing  of  all  possible  words  in  the  vocabulary  which 
could  precede  or  follow  the  selected  word.  This  listing  will  be  displayed  on  the  three  small  CRT 
screens  when  the  subject  presses  the  button  labeled  "RULES.”  Figure  16  shows  the  selected  word 
(primary)  shaded,  and  all  related  secondary  words  on  the  screen  are  outlined.  The  subject  should 
then  mark  one  of  the  displayed  words  as  the  next  selection  (by  pressing  the  corresponding  button). 
At  the  discretion  of  the  experimenter,  the  scenario  software  may  be  used  with  or  without  the 
"RULES"  function.  The  experimenter  may  also  wish  to  teach  the  subjects  the  vocabulary  and  the 
rule  sets  in  several  training  sessions  prior  to  the  actual  testing  sessions. 

7. 1.3. 2.  HUD  Display 

The  software  being  implemented  for  the  communication  scenario  will  also  include  a  display  of  the 
information  given  in  a  standard  heads-up  display  (HUD).  Figure  17  illustrates  this  display.  The 
HUD  display  will  appear  along  the  outer  edges  of  the  large  CRT  screen.  Information  on  this  dis¬ 
play  will  include  the  heading,  altitude,  and  airspeed  of  the  aircraft.  This  information  will  be 
updated  throughout  the  scenario,  but  will  remain  constant  for  the  duration  of  each  separate  message 
presentation  (updates  will  occur  between  message  segments).  The  HUD  display  may  be  used  with 
the  scenario  at  the  experimenter's  discretion.  If  used,  subjects  may  be  requested  to  verbally  report 
specific  aircraft  status  information  (i.e.,  "State  your  present  altitude").  Requests  for  status  reports 
should  be  treated  as  separate  message  segments  of  the  scenario.  When  the  scenario  is  used  alone, 
this  display  may  help  to  add  interest  to  the  task.  When  die  scenario  is  used  in  conjunction  with  a 
secondary  task,  this  display  will  serve  to  direct  the  subject's  attention  to  the  secondary  task  display 
(see  Section  7.2). 

7.2.  SECONDARY  TASK 

Section  4.3  described  the  utility  of  presenting  secondary  tasks  concurrently  with  the  task  of  interest 
when  investigating  situations  where  an  operator's  performance  may  be,  somehow,  degraded.  SRL 
has  chosen  to  provide  a  secondary  tracking  task  to  be  used  in  conjunction  with  the  primary  com¬ 
munication  task.  The  tracking  task  was  chosen  for  a  number  of  reasons.  First,  tracking  tasks 
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Figure  17.  HUD  Display 


realistically  model  the  flight  task.  A  large  portion  of  a  pilot's  time  is  spent  either  tracking  a  target 
or  stabilizing  a  system.  Second,  in  the  flight  environment  tracking  tasks  and  communication  tasks 
will  naturally  occur  together.  A  pilot  is  constantly  communicating  as  he  flies  his  aircraft.  Third,  a 
large  amount  of  data  has  been  collected  using  tracking  tasks.  The  reliability,  validity,  and  sensi¬ 
tivity  of  this  data  supports  the  use  of  tracking  as  a  secondary  task.  Finally,  due  to  the  information 
processing  requirements  (Figure  18)  of  the  tracking  task,  it  should  not  significantly  interfere  with 
the  communication  task.  Tracking  generally  requires  a  great  amount  of  both  visual  information 
processing  (visual  input,  spatial  encoding/centrai  processing)  and  motor  output.  The  communi¬ 
cation  task  described  earlier  in  this  section  should,  instead,  require  a  great  amount  of  auditory 
processing  (auditory  input,  verbal  encoding/central  processing)  and  a  minimal  to  moderate  com¬ 
bination  of  manual  and  verbal  responses. 

The  tracking  task  being  implemented  is  a  compensatory  tracking  task  of  moderate  difficulty.  The 
tracking  task  display  will  appear  in  the  center  of  the  large  CRT  screen  (Figure  19).  The  HUD  dis¬ 
play  will  appear  around  the  periphery  of  the  tracking  display  (the  dotted  lines  in  Figure  19  would 
not  actually  appear,  but  merely  indicate  the  perimeter  in  which  the  secondary  task  could  be  pre¬ 
sented).  Use  of  the  HUD  display  in  the  communication  scenario  will  aid  in  directing  the  subject’s 
attention  to  the  tracking  task.  Subject  instructions  will  follow  the  subsidiary  task  paradigm  (see 
Section  4.2. 2.3. 2).  Subjects  will  be  told  to  maintain  performance  on  the  communication  task  at  the 
expense  of  the  tracking  task.  Separate  baseline  levels  of  performance  should  be  collected  for  both 
the  communication  task  and  the  tracking  task  if  both  tasks  are  to  be  used  together.  Critical  lags 


60 


Figure  19.  Tracking 
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(delays)  in  the  tracking  task  will  be  minimal  as  the  joystick  system  in  the  PACRAT  test  facility  is 
force  style.  Wickens  (1988)  has  shown  that  with  force  style  sticks  where  no  position  feedback  is 
given  to  the  operator,  lags  in  the  tracking  system  can  be  especially  damaging  to  performance.  The 
use  of  the  tracking  task  as  a  secondary  task  is  recommended  for  AAMRL/BBA's  current  research 
interests.  However,  other  secondary  tasks  (see  Section  5.1.2),  for  example,  the  linguistic  pro¬ 
cessing  task,  could  easily  be  implemented  on  the  current  system.  The  remainder  of  this  section 
will  describe  the  basic  components  of  manual  control  tasks  (tracking  tasks)  as  they  relate  to  the 
human  operator. 

7.2.1.  The  Human  Operator  as  an  Element  of  a  Control  System 

Wickens  (1984a)  describes  a  manual  control  task  or  "tracking  task"  as  any  task  in  which  the  con¬ 
trol  of  a  dynamic  system  is  accomplished  by  manipulation  of  the  hands.  Manual  control  tasks  dif¬ 
fer  greatly  in  their  difficulty  and  demand  characteristics,  depending  upon  the  system  which  is  to  be 
controlled.  Examples  of  these  tasks  include  driving  an  automobile,  stabilizing  an  aircraft,  or  manu¬ 
ally  assembling  miniature  components  under  a  microscope.  What  these  tasks  have  in  common  is 
that  the  operator  must  continually  adjust  some  control  variable  to  make  it  correspond  to  a  continu¬ 
ous  reference  signal.  A  task  may  be  either  to  stabilize  a  system  in  the  face  of  disturbances  (for 
example,  a  pilot  flying  in  high  winds),  or  to  pursue  an  evasive  target  (for  example,  the  target 
aiming  task  of  a  gunner). 

A  typical  tracking  task  is  illustrated  in  Figure  20  (Wickens,  1986).  In  this  representation  a  time 
varying  command  input,  ic(t),  is  displayed  visually,  (D),  to  the  operator,  (H).  The  operator  applies 
some  amount  of  force  over  time,  f(t),  to  the  control  device,  (C).  The  resulting  control  movement, 
x(t),  delivers  a  signal  to  the  system,  (G),  which  leads  to  the  system  response,  u(t).  The  operator 
exerts  control  to  make  u(t)  correspond  with  the  command  input,  ic(t).  This  is  achieved  by  mini¬ 
mizing  the  error,  e(t),  or  the  difference  between  ic(t)  and  u(t).  Depending  upon  its  characteristics, 
the  display  may  be  described  either  as  "pursuit"  or  "compensatory."  If  both  ic(t)  and  u(t)  are  pre¬ 
sented,  the  display  is  a  pursuit  display.  If  only  their  difference,  e(t),  is  presented,  the  display  is 
described  as  compensatory.  Finally,  disturbance  inputs,  i<j(t),  may  affect  the  system.  An  example 
of  a  disturbance  input  is  a  gust  of  wind  which  moves  an  aircraft  from  its  approach  path.  Both  dis¬ 
turbances  and  command  inputs  represent  information  presented  to  the  operator.  However,  distur¬ 
bance  inputs  must  be  corrected,  while  command  inputs  are  to  be  followed. 
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Figure  20.  Representation  of  the  Tracking  Loop  (Wickens,  1986) 

7.3.  DEPENDENT  VARIABLES 

Data  will  be  collected  on  a  number  of  VCE  dependent  variables.  For  the  communication  scenario, 
the  following  measures  will  be  collected  by  the  computer: 

Time  Out  -  Subject  makes  no  response  in  allotted  time. 

Error  -  Subject  makes  an  incorrect  verbal  or  manual  response. 

Response  Time  -  Time  from  the  end  of  a  given  message  transmission  by  the  experimenter  to 
the  end  of  a  subject's  response. 

Repeats  -  Number  of  times  the  subject  asks  for  a  message  to  be  repeated. 

For  the  secondary  task,  the  tracking  task,  the  following  measures  will  be  collected. 

Average  Absolute  Error  (AAE)  = 


Root-Mean-Square  Error  (RMS)  = 
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where: 


T  =  Task  Length 
e  =  Error  at  Time  T 
dt  =  Sampling  Intervai 

AAE  describes  the  operator's  tracking  variability  when  integrated  in  conjunction  with  the  other 
performance  measures,  and  RMS  is  a  measurement  of  performance  variability. 

7.4.  ALTERNATE  VCE  SCENARIO  CONFIGURATION 

In  addition  to  the  communication  scenario  described  in  Section  7.1,  an  alternate  scenario  was  pro¬ 
posed  by  SRL  but  not  chosen  for  implementation.  This  scenario  was  based  on  a  menu  selection 
approach  modeling  the  menu  selection  systems  found  in  some  advanced  aircraft.  The  same  sce¬ 
nario  constraints  (see  Figure  10)  employed  during  the  communication  scenario  development  also 
served  as  guidance  for  the  menu  selection  task.  A  variety  of  menu  modules  were  constructed,  each 
module  representing  a  different  aircraft  system  (i.e.,  WEAPONS).  A  series  of  menu  pages  were 
constructed  for  each  module,  each  page  representing  a  different  level  of  that  module.  Subjects 
would  be  required  to  make  a  variety  of  menu  selections  based  upon  verbal  instructions  from  the 
experimenter.  Such  a  scenario  could  also  be  easily  incorporated  with  a  variety  of  secondary  tasks. 

7.4.1.  Menu  Selection  Scenario  Design 

The  design  of  the  menu  selection  task  centered  on  the  existing  PACRAT  equipment  configuration 
in  each  of  the  subject  stations  (see  Section  1).  The  three  small  CRTs  would  be  allocated  for  the 
presentation  of  the  menus.  Subject  responses  to  a  menu  selection  request  would  be  accomplished 
using  the  external  pushbuttons  arranged  along  the  outside  edge  of  the  CRTs.  These  buttons  could 
be  considered  multifunction  keyboards  (MFKs). 

The  main  menu  (representing  various  aircraft  subsystems,  i.e.,  stores)  would  be  located  on  the  left 
most  CRT.  The  main  menu  was  comprised  of  the  following  aircraft  subsystems:  communication, 
navigation,  sensors,  stores,  and  systems.  Within  each  of  these  subsystems,  several  sublevels 
were  developed  to  complete  the  overall  menu  tree.  For  example,  all  of  the  subsystems  were  devel¬ 
oped  to  the  second  sublevel,  but  only  the  communication  subsystem  was  developed  into  a  third  sub- 
level.  The  partial  content  for  the  menu  tree  is  depicted  in  Figure  21. 
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All  Subsystem  Level  One  Items 

Subsystem  Level 

Three 

Devel oped  Only 

for  Communication  Subsystem  Level 

One  Items 

Figure  21.  Partial  Menu  Selection  Task  Content 

Selection  of  a  subsystem  would  be  accomplished  by  pressing  the  appropriate  MKF  key.  The  mid¬ 
dle  screen  would  then  display  the  next  sublevel  for  that  subsystem.  Activating  the  MKF  key  appro¬ 
priate  to  the  first  sublevel  would  produce  a  new  middle  screen  depicting  the  next  lower  sublevel. 
This  type  of  menu  selection  logic  is  referred  to  as  branching  logic.  Although  the  confusability  of 
the  menu  words  is  low,  in  most  cases  the  menu  content  could  be  structured  with  inter  or  intra  sub- 
level  confusability  and,  thus,  obtain  greater  face  validity  for  the  word  intelligibility  aspects  of  this 
scenario. 

7.4.2.  Menu  Selection  Scenario  Presentation 

The  presentation  of  the  menu  selection  task  would  occur  in  the  same  manner  as  the  communication 
scenario  described  earlier.  The  experimenter  would,  following  the  scenario  script,  transmit  the 
instruction  to  the  subject  and  repeat  the  message  as  requested  by  the  subject.  Activities  required  by 
the  subject  include  perceiving  the  message  and  responding  to  the  message  through  the  use  of  the 
MFKs.  The  outcome  of  the  subject’s  activities  would  be  measures  of  time  outs,  errors,  response 
time,  and  the  number  of  repeats.  Figure  22  illustrates  the  interaction  of  the  experimenter  and  the 
subject  for  an  example  message  presentation  during  the  scenario. 

The  menu  selection  scenario  could  be  accomplished  independently  of  other  tasks  or  combined  with 
a  secondary  task.  A  tracking  task  performed  in  concert  with  the  menu  selection  activity  would 
provide  greater  operational  construct  to  the  scenario.  Other  secondary  tasks  to  be  used  with  the 
MFK  menu  primary  tasks  could  also  be  viable  for  VCE  research. 
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TRACKING  OUTPUT 
—  I.e.  RMSE 


COMMUNICATION 
OUTPUT 
—  ERROR 
—  LATENCY 
—  REPEATS 
—  TIME  OUTS 


Figure  22.  Interaction  of  the  Experimenter  and  the  Subject  During  a  Message  Presentation 
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Section  8 

CONCLUSIONS  AND  RECOMMENDATIONS 

8.1.  CONCLUSIONS 

The  evaluation  of  VCE  is  a  complex  problem  involving  a  large  number  of  factors.  Research  on 
VCE  should  include  consideration  of  the  following  classes  of  variables: 

•  Human  Component:  Information  processing  requirements  and  capacities;  word  recogni¬ 
tion,  and  sentence  processing  capability  in  a  two-way  interactive  mode;  mental  workload. 

•  Information  Component:  Syllables,  words,  sentences,  continuous  discourse;  entropy, 
mutual  information,  and  channel  capacity  measurements;  task  information  requirements. 

•  Equipment  Component:  Microphones,  amplifiers,  earphones,  jammers,  displays; 
natural/synthetic  speech. 

•  Environmental  Component:  Noise,  acceleration,  vibration,  physiological  stressors 
(heat/cold,  fatigue,  pretreatment  drugs). 

Few  actual  measures  of  VCE  are  reported  in  the  literature,  and  little  theoretical  work  on  the  subject 
of  VCE  has  been  discovered.  The  majority  of  research  on  communication  effectiveness  has  tradi¬ 
tionally  centered  on  unidirectional  rather  than  interactive  two-way  communications.  Additionally, 
the  literature  survey  discovered  no  research  on  dual-task  studies  for  VCE,  although  dual-task 
studies  on  speech  intelligibility  were  reported. 

8.2.  RECOMMENDATIONS 

Based  upon  the  literature  search,  the  communications  scenario  development,  and  the  selection  of 
the  secondary  task,  a  number  of  recommendations  have  been  made  concerning  the  evaluation  of  the 
selected  VCE  measure  and  the  expansion  of  the  intelligibility  research  incorporating  the  VCE 
measure. 

8.2.1.  Evaluation  of  the  Communication  Scenario 

Because  the  communication  scenario  proposed  for  use  in  the  PACRAT  test  facility  is  a  newly  devel¬ 
oped  experimental  task  rather  than  a  standardized  psychometric  test,  no  specific  validity,  reliability, 
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and  sensitivity  data  are  available.  To  ensure  generalizability  and  aid  in  interpretation  of  data  col¬ 
lected  using  the  VCE  scenario,  SRL  recommends  that  validity,  reliability,  and  sensitivity  evalua¬ 
tions  be  made. 


Validity  evaluations  (i.e.,  the  extent  to  which  the  scenario  actually  measures  the  effectiveness  of 
pilot  voice  communication)  should  be  made  by  comparing  data  obtained  by  using  the  scenario  with 
data  obtained  using  other  standardized  intelligibility  tests  (i.e.,  MRT,  DRT,  etc.).  Data  available 
for  these  standardized  tests  should  be  positively  correlated  with  data  collected  using  the  communica¬ 
tion  scenario.  Validity  evaluations  should  be  made  for  a  variety  of  situations  and  stressors  (i.e., 
the  effect  of  noise,  jamming,  workload,  etc.  on  communication  effectiveness). 

Reliability  evaluations  (i.e.,  the  consistency,  repeatability,  or  extent  to  which  two  applications  of 
the  same  measure  yield  the  same  results)  of  the  communication  scenario  should  also  be  made.  Data 
collected  repeatedly  via  the  scenario  under  the  same  experimental  conditions  should  not  vary  signifi¬ 
cantly.  Reliability  evaluations,  like  validity  evaluations,  should  be  made  under  a  variety  of  experi¬ 
mental  conditions. 

The  sensitivity  (i.e.,  specificity,  capability  of  making  fine  distinctions)  of  the  communication  sce¬ 
nario  should  also  be  evaluated.  Does  data  collected  using  the  scenario  allow  the  researcher  to 
assess  the  relative  potential  for  communication  degradations  among  various  equipment  design 
options,  various  operating  conditions,  and  various  task  situations?  Sensitivity  data,  like  validity 
data,  should  be  obtained  by  comparing  data  collected  with  the  communication  scenario  to  data  col¬ 
lected  using  a  variety  of  other  intelligibility  measures. 

Additional  evaluation  of  the  scenario  should  include  the  assessment  of  the  number  of  each  message 
length  that  constitutes  a  scenario  of  low,  medium,  or  high  load  level  for  the  operator.  A  primary 
task  with  varying  levels  of  load  will  allow  for  optimal  flexibility  of  VCE  assessment. 

8.2.2.  Development  of  Alternative  Secondary  Task 

The  use  of  secondary  tasks  other  than  the  tracking  task  will  allow  for  the  assessment  of  other 
human  information  processing  resources.  The  tracking  task  assesses  spatial/central  processing  and 
a  manual  response  mode.  Additional  secondary  tasks  will  allow  for  the  examination  of  processing 
capacities  in  conjunction  with  the  primary  task  (the  communication  task).  For  example,  a  linguistic 
processing  task,  although  utilizing  similar  information  processing  resources  as  does  the  primary 
communication  task,  would  allow  for  the  assessment  of  performance  degradation  caused  by  two 
competing  tasks.  Other  types  of  tasks  that  might  prove  useful  as  secondary  tasks  include  tasks 
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requiring  information  integration,  information  manipulation,  detection,  identification,  and  divided 
attention.  Documentation  for  a  secondary  task  battery  should  include  the  following  information  for 
each  task:  purpose,  description,  background,  reliability,  validity,  sensitivity,  data  output  specifica¬ 
tions,  training  requirements,  and  subject  instructions. 

8.2.3.  Additional  Research  on  VCE 

Existing  VCE  related  research  should  be  integrated  into  a  theoretical  model.  Studies  should  be  con¬ 
sidered  on  two-way  interactive  voice  communications  where  the  performance  of  the  operator  is 
dependent  on  intelligibility  of  the  message.  A  large  number  of  controlled  laboratory  experiments 
should  be  performed  to  investigate  the  individual  factors  that  comprise  the  communication  prob¬ 
lem.  After  these  individual  factors  have  been  investigated,  a  group  of  less  structured  laboratory 
experiments,  modeling  the  natural  disorganization  of  human  speech  and  the  large  variety  of  task 
and  environmental  variables,  which  comprise  real  operational  situations,  should  be  performed. 
Finally,  field  studies  (high  fidelity  aircraft  simulator  in-flight  testing  or  ground  based  systems) 
should  be  performed  in  various  operational  situations. 
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Appendix  A 

COMMUNICATION  SCENARIO  VOCABULARY 

Appendix  A  contains  the  vocabulary  to  be  used  in  the  communication  scenario.  The  listing  that  fol¬ 
lows  is  the  output  of  the  vocabulary  database,  database  "D2"  (see  Section  7.1.1  for  a  description  of 
how  the  vocabulary  was  derived).  The  496  unique  words  that  comprise  the  vocabulary  are  listed 
in  the  far  left  column  under  the  heading  "primary  word."  A  primary  word  is  defined  as  the  first 
word  received  in  any  set  of  two  words,  or  a^.  Explanations  of  the  remaining  column  headers  are 
as  follows: 

"FIRST"  if  =  0  The  primary  word  at  che  left  cannot  appear  as  the  first  word  in  a 

sentence. 

if  =  1  The  primary  word  at  the  left  can  appear  as  the  first  word  in  a 

sentence. 

"LAST"  if  =  0  The  primary  word  at  the  left  cannot  appear  at  the  end  of  a  sentence, 

if  =  1  The  primary  word  at  the  left  can  appear  at  the  end  of  a  sentence. 

"SECONDARY  WORD"  The  word  received  immediately  after  a^.  All  words  in  this  column 

may  occur  directly  after  the  primary  word  to  the  left.  The  number  at 
the  top  of  this  column  indicates  the  number  of  secondary  words  in 
this  column. 
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LEAK 

LINE 

so 

STATE 

STILL 

three 

WOULD 


96 


GAP 


G 


1 


0 


GAS  01  2 

OR 

WHEN 


GATE  0  C  4 

IS 

NINE 

TAC 

THREE 


GAVE  00  1 

YOU 


GEAR  11  8 

A  NO 
CHECK 
=  0R 
I  S 
LOW 
STOP 
THERE 
TOUCH 


GET  10  1  R 

ALL 
3  AC  K 
C  A3 
CLEAR 
=  IVE 
HERE 
HOLD 
IN 
IT 

MORE 

NEW 

ONE 

RID 

S  0  HE 

THAT 


97 


THROUGH 

TWO 

WITH 

YOU 


GLAD 


0  0 


1 


IT 


PASS 

PAST 

path 


GO  11  23 

ALL 

AND 

AT 

3  AC  K 
3  Y 

CHECK 
p  OR 
GEAR 
GREEN 

H  ALE 
I'LL 
IF 
IN 

LEFT 

LOW 

ONE 

SIGHT 

ROUND 

THAT 

THREE 

TWO 

WITH 

YOU'LL 


GCOO 


IQ 


CALL 

CHECK 

CLIP 

CUT 

DAY 


98 


V 


FLIGHT 
FOR 
=  OUR 
GO 

GOOO 

JUST 

LUCK 

ONE 

SHOW 

START 

STILL 

THEN 

TI«E 

WHEN 

YOU'VE 


GOT  11  2  4 

ALL 

3ACK 

EIGHT 

FT  VE 

flight 

FOR 

c  OUR 

GOOO 

MERE 

IS 

IT 

L  E  c  T 
\t  c 

MY 

ONE 

SIGHT 

SIX 

SOME 

THAT 

THREE 

TWO 

WITH 

W0R3 

YOU 


GR  A3 


0  0 


GRADE 


•3 


1 

MODE 


99 


GRAND 


1 

I'LL 


GREAT 


1  1 


GREEN 


gri: 


1  1 


=  0® 
LIGHT 


STATE 

SWITCH 


.ROSS 


0  1 


.ROUP 


1  1 


c  OU  R 
I'LL 
THRSE 


GUESS 


0  1 


1 

THAT 


GUIDE 


GULF 


IN 


EIGHT 


HACK 


100 


HAD 


0 


0 


3 


HAD 
I  N 
NO 


HALF  1  1  6 

SLIDE 

LEFT 

MILS 

MILES 

right 

TAKE 


HAND  0  1  3 

TURN 

TWO 

WINS 


HANS  0  C  3 

LE^T 

RIGMT 

SLIGHT 


HAS  1  0  A 

SOT 

LEFT 

LOST 

TWO 


HAVE  1  1  20 

E  IGHT 

=  IVE 

HOT 

IT 

LET 

MODE 

MY 

NINE 

NO 


101 


NOT 
ONE 
0  R 

RAIN 

RED 

RIGHT 

THAT 

THREE 

TWO 

WITH 

YOU 


HAWK  2  1  5 

AT 

-OUR 

ONE 


HAZE  0  0 


CAN 

COMES 

GOT 

HAS 

IS 

JUST 

SET 

SHOULD 

WANTS 


HE'LL 


1  0 


1 


3  E 


h  E 


'S 


1  0 


3 


AT 

IN 

JUST 

LOW 

NO 

NOT 

RIGHT 

STILL 


102 


HEAD 


0 


0 


3 


SACK 

SOUND 

PLEASE 


HEAR  01  4 

IS 
m  = 

THAT 

YOU 


HEARD  0  0  1 


M  E 


HEIGHT  0  1 


HELP  1  1  1 


HE 


HELPS  0  3  1 

SET 


HER  01  0 


HERE  01  12 

AND 

AT 

5=  I  V  E 
P  0  R 
I'LL 
IF 
IN 

PLEASE 

TAC 

THEN 


103 


A  E 

YOU 


HIGH  1  1 

AT 

=  IVE 
KEY 
LEFT 
TWO 


HIS 


Q 


1 


a 


ASE 


HIT 


0 


1 

BASE 


HITS 


0  o 


1 


IN 


HO 


0  1 


2 


LEFT 

YOU 


HOLO  '  ' 

FIVE 
c  0  R 
HE 
MY 

NINE 

PLEAS 

WAIT 

YOU 


HOLE 


0  0 


1 

RING 


104 


home 


0 


0 


1 


CALL 


HOP  0  1 


HOT  0  1  6 

3UT 

FIVE 

LAST 

SIX 

THREE 

TWO 


HOM'D  1  3  1 

YOU 


1  1  35 

AM 

CALL 

can 

can't 

COME 

COULD 

DO 

GAVE 

GET 

GO 

GOT 

GUESS 

HAD 

HAVE 

HEAR 

HEARD 

JUST 

LEFT 

LOST 

MADE 

NEED 

ONE 

READ 

SAID 

SAY 

SEE 


105 


SHOW 

STILL 

THINK 

THOUGHT 

TOOK 

WANT 

WENT 

WILL 

WOULO 


I'D 


0 


3 


SO 

LIKE 

SAY 


I'LL  1  Q  ?6 

3E 

BRING 

CALL 

CHECK 

GET 

GO 

HAVE 

HOLO 

JOIN 

KEEP 

LEAVE 

LOCK 

MAKE 

MEET 

NEED 

PICK 

PUT 

SAY 

SEE 

SET 

SWITCH 

TAKE 

TALK 

TRACK 

TRY 

WAIT 


I'M 


0 


11 


106 


AT 

SACK 

FIVE 


GLAD 

HIGH 

IN 

NOT 

SIX 

STILL 

THREE 

WITH 


ICE 


0 


1  1  12 

HE 

HE'S 

;T-s 

NOT 

SO 

THAT 

THEY 

WE 

YOU 

YOU'D 

YOU'LL 


IN  11  52 

AND 

AT 

3ACK 

BASE 

BOOK 

BOUND 

CASE 

CHECK 

SLEET 

flight 

FOR 

FOUR 

HERE 

HIGH 

HIS 

HOT 

I 

I'LL 

IN 

IT 

JUST 

LEFT 


107 


LESS 

LOW 

WY 

NINE 

ONE 

OR 

PLACE 

PLEASE 

READ 

REAL 

SAY 

SIGHT 

SITE 

SO 

STATE 

TANK 

TEN 

THAT 

THEN 

THERE 

THREE 

TOUCH 

TOW 

TURN 

TWO 

USE 

WANT 

WHEN 

WITH 

YOU 


1  1  ^2 

AND 

AT 

SACK 

3ASE 

CLEAR 

COLO 

t  IGHT 

FIVE 

FLIGHT 

FOUR 

GEAR 

GOOD 

HE 

HIGH 

IN 

IT 

JUST 

LEFT 

LOCK 

LOUD 

LOW 


108 


MODE 

MIME 

SO 

NOT 

ONE 

PLACED 

REO 

RIGHT 

SIX 

SO^T 

SPEED 

STATE 

STILL 

STRIKE 

TAC 

TEN 

THAT 

THREE 

TWO 

WET 

WITH 


1  1  3? 

ALL 

AND 

AT 

BACK 

BLOWN 

BY 

CHECK 

COME 

COMES 

COULD 

DID 

PIVE 

HERE 

IF 

IN 

IS 

IT'S 

JUST 

MAY 

NICE 

PAST 

RIGHT 

SEEMED 

SEEMS 

SET 

SHOULD 

SIX 

SO 

SOUNDS 

STILL 


109 


STRAIGHT 

THAT 

THEN 

WHEN 

WIDE 

WILL 

WORKED 

YOU'VE 


IT'S  1  J  19 

ALL 

AT 

CLEAR 

COHE 

EIGHT 

FIVE 

FOUR 

r 

IN 

JUST 

LOUD 

NOT 

ONE 

PACK 

RED 

SIX 

STILL 

THREE 

TWO 


ITS  00  0 


JET  01  5 

E  IGHT 
PIVE 
ME'S 
IN 

THAT 


JOIN 


0 


2 


AT 

IN 


110 


JOINED 


0 


0 


1 


WITH 


JOY 


0 


1 

3UT 


JUST  1  1 

ACT 

BRING 

CALL 

C  AHE 

DROP 

FALL 

=  0UND 

GET 

GO 

HOLO 

IN 

JOINED 

KEEP 

KIND 

*AKc 

V  C  V  E  D 

NEED 

NO 

NOT 

SOUTH 

STAND 

STAPT 

STAY 

TAKE 

TELL 

4  ENT 


KEEP 


4 


IT 

THAT 

TWO 

YOU 


KEY 


0 


2 


Ill 


THERE 

WITH 


KILL 


0 


0 


0 


KIND 


KNOCK 


KNOT 


KNOTS 


LACK 


LANO 


AND 

I 

ICE 

SWITCH 

TWO 


A  NO 
AT 

c  I  V  E 
LINE 
MIGHT 
SO 

THOUGH 

TWO 

WET 

WHEN 


LANE 


AND 

STILL 


LAST 


1 


1 


5 


AT 

GO 

RUN 

TWO 

WAIT 


LATE  1  1 

FIVE 
=  CUR 
ONE 

six 

TWO 


LAY 


0  0 


1 


IN 


LEAD 


2 

ONE 

three 


LEADS  1  3 


LEAK  0  2 

IN 

OR 


LEAST 

C 

J 

0 

LEAVE 

0 

1 

2 

fourth 

HERE 
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LEFT  1  1  35 

AND 

AT 

BACK 

BASE 

CLEAR 

CROSS 

CUT 

c  I  V  E 

FOR 

FOUR 

GREEN 

HAND 

HERE 

HOT 

I 

IN 

LONG 

MILE 

NEXT 

NINE 

ONE 

OR 

SIDE 

SIX 

STATE 

TEN 

there 

THREE 
TOUCH 
TURN 
T  0 
HHEN 
W  ^  E  R  E 
WIND 
WING 


LEG  GO  1 

I  S 


LESS  0  1  2 

=  :  VE 
THAN 


LET  13  2 
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ME 

YOU 


L  I c  T 


0 


1 


AT 

WINDS 


LIGHT  0  1 

IS 

IT 


CLEAR 

GO 

I'M 

IT'S 

ITS 

LIVE 

ME 

ONE 

RIGHT 

SIX 

SOME 

THAT 

THREE 

TWO 

WE'RE 

WINDS 

YOU 

YOU'LL 


LINE  1  1  5 

CLEAR 
c  OR 
IN 

THEN 

THREE 

TOUCH 


LINK  0  0 


1 

WITH 


115 


LIST 


0 


o 


2 


ONE 

THAT 


LIVE  0  0 


0 


LOAD  C 


l 


AT 

IN 


LOCK 


1 

ALL 


LOCKED 


2 


CLEAR 
E  OR 


LOG  0  1  3 


LONG  11  2 

IS 

IT'S 


LOOK  1  1  4 

AT 
P  OR 
GOOD 
LIKE 


LOOP 


0 


0 
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LOST 


0 


G 


5 


ALL 

HIS 

IT 

vi  ~ 
MY 


lot  o  o 

WE 


LOUD  1  3  3 

AND 

BUT 

CLEAR 


LOW  1  1  9 

AND 

KEY 

LOW 

SPEED 

STATE 

STAY 

TEN 

THREE 

TURN 


LUCK 


0  1 


1 


NEXT 


MADE 


0  J 


1 

THAT 


MAIN 


IS 

SKY 


117 


MAKE 


1 


0 


7 


IT 

LEFT 

LOW 

ONE 

SPEED 

THAT 

WAVE 


MAN  1  1  i 

SIX 


MARK  1  1  5 

AT 

FOR 

FOUR 

TIME 

WHITE 


^ARKS  0  1  0 


MAT  00  3 

9  E 

HAVE 

PATH 


M  E  0  1  70 

AT 

3  AC  K 

3ASE 

CHECK 

FOR 

HERE 

I 

I'D 

IP 

OR 

PLEASE 

PRIOR 

SAME 


118 


TALK 

THAT 

TWO 

USE 

WILL 

WITH 

WORK 


HE  AN 


0  0 


1 


R  IGHT 


MEET 


C 


2 

FOR 

YOU 


MEN  10  2 

AND 

IN 


MERGE  00  0 


mid  0  1  0 


MIGHT  00  * 

3  E 

CHECK 

HAVE 

NEED 


MIKE  1  1  6 

CHECK 

ONE 

3USH 

R  IGHT 

THREE 

WIND 


119 


MILE 


0 


1 


22 


AND 

CALL 

CUT 

PLIGHT 
c  OR 
HIGH 
I 

IS 

LEFT 

ONE 

PAST 

PPIOR 

RIGHT 

SEA 

SEE 

SHOULD 

SIX 

SPREAD 

STRAIGHT 

TEST 

THEN 

THREE 


MILES  0  1  36 

AND 

AT 

CALL 

CHECK 

DO 

EAST 

EIGHT 

FIVE 

CLI GHT 

FLY 

=  OR 

I'D 

I'M 

IN 

IS 

KEEP 

LEFT 

LOCK 

ONE 

RIGHT 

SAY 

SEE 

SEEMS 

SIX 

SOUTH 

STATE 
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STAY 

STRAIGHT 

TAKE 

THEN 

THREE 

TURN 

TWO 

WE 

WEST 

YOU 


mix  30  1 

IT 


MODE  1  0  5 

IS 

ONE 

SQUAWK 

THREE 

TWO 


MOON  1  3  1 

TWO 


MORE 


0 


4 

3  4  S  E 
SUIOE 
LOW 
TIME 


MOVE 


U 


1 


IT 


MOVED 


0 


0 


0 


MUCH 


0 


1 


5 


C0R 

CUEL 

WE'LL 

WOULD 

YOU 


MUST  0  3  1 

3  E 


MY  11  U 

BASE 

^UEL 

GATE 

GREEN 

GROUP 

I 

LAST 

LEFT 

ONE 

PUSH 

RIGHT 

SIDE 

THREE 

TURN 


NAME  0  0 

IS 


NEAR  00  2 

AIR 

END 


NEED  1  1  7 

ALL 
FOR 
IF 
*  E 
MY 

SOME 

YOU 


122 


NEEDS 


0 


0 


0 


NET  11  6 

AT 

GO 

ONE 

STRIKE 

THAT 

THREE 


NEW  1  J  3 

ONE 

STATE 

WAKE 


NEXT  1  1  5 

FEW 

ONE 

RUN 

TI«E 

TOUCH 


NICE  00  3 

AND 

DAY 

FLIGHT 


NIGHT  1  1  2 

THREE 

TIME 


NINE  1  1  5  2 

ALL 

ANO 

ARE 

AT 

BASE 


123 


CALL 

CAN 

CHANGE 

CHECK 

CLEAR 

COULD 

CROSS 

EIGHT 

*IVE 

*=  OR 

cOUR 

HOLD 

I'LL 

I'M 

IF 

I  S 

KNOTS 

LATE 

LEFT 

mile 

MILES 

.MODE 

NEW 

NINE 

NO 

ONE 

PICK 

PUSH 

RED 

RIGHT 

SAT 

SEATS 

SIX 

SQUAWK 

STATE 

STEER 

TEN 

THANK 

TMREE 

TIME 

TURN 

TWO 

WE'LL 

WILL 

WIND 

WOULD 

YOU 


1  T 

3  LOW 
CHANG 


124 


p  OR 

GOOD 

I 

I'M 

JOY 

NEED 

NOT 

RANGE 

READ 

W  E 

WE'RE 


NOSE  0  1 

AT 


NOT  1  1  13 

AND 

AT 

CLEAR 

COME 

=  LY 

FOR 

GET 

HAVE 

HOLD 

IF 

IN 

MUCH 

RIGHT 

SHOW 

so 

TOO 

WORK 

YET 


OLD  OG  1 

ONE 


ONE  11  1  1  5 

AND 

ARE 

AT 

3  ALL 
EASE 
S  E 


125 


FLUE 

C  ILL 

CHANGE 

CHECK 

CLEAR 

COME 

DID 

DO 

EAST 

EIGHT 

c  I  NO 

FIVE 

FLIGHT 

PLY 

cOR 

cOUR 

=  REE 

GEAR 

GET 

GLIDE 

GO 

GOOD 

GOT 

half 

HAS 

HAVE 

HOLD 

HOT 

I 

I'LL 

I'M 

IP 

IN 

lA 

ITS 

KNOT 

LATE 

LEPT 

LIGHT 

LIKE 

LOAD 

LOCK 

LOCK 

LOUD 

LOW 

•'ACE 

MARK 

MARKS 

«ILE 

MILES 

MORE 

MOVED 

MY 

NEEDS 

NINE 

NO 


ONE 

PASS 

PLAN 

PLE ASF 

PUSH 

READ 

RED 

PIGHT 

SAY 

SIDE 

SIGHT 

SITE 

SIX 

SOUTH 

SPAPE 

SPEED 

S  2UAWK 

STAND 

STATE 

STEEP 

STRAIGHT 

STRIKE 

SWITCH 

TAC 

TAKE 

TEN 

TENTH 

TEST 

THAT 

THEN 

THESE 

THIRD 

THREE 

THROUGH 

T  IGHT 

TIHE 

TRIED 

TURN 

TWO 

we 

WE'LL 

WE'  PE 

WE'VE 

WERE 

WHEN 

WHICH 

WhO 

WILL 

WIND 

WITH 

WOULD 

YOU 

YOU'LL 
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1 


3 


21 


OR 


ARE 

AT 

DO 

FIVE 

=  ou? 
so 

LEFT 

LESS 

NOT 

ONE 

R  IGHT 

SIX 

SO 

SWITCH 

THAT 

THREE 

TOUCH 

TWO 

W  E 

WHEN 

YOU 


PACE  0  0  1 

SPEED 

PACK  1  3  1 

ONE 

PACKS  0  1  0 

PAN  1  0  0 


PARK  1  1  2 

ANO 

THAT 


PAPKEO  J  0 


1 

NEXT 


128 


PART 


0 


1 


0 


PARTS 


0 


PASS 


CODE 

COUP 

IT 

LEFT 

SIX 

THAT 

TWO 

YOU 


PAST 


PATH 


PHASE 


10 

AND 

9EAR 

CALL 

I  S 

ONE 

P  IGHT 

SET 

THREE 

TURN 

TWO 


STATE 


PICK 


j 


1 


IT 


place 


2 

AND 

THREE 


129 


PL  AC  ED 


0 


1 


0 


PLAN 


PLANE 


PLATE 


PLEASE 


PRIOR 


PULLED 


PUHP 


AND 
c  I V  E 


AND 

MOLE 


AND 

CALL 

POP 

SET 

IP 

fASS 

STATE 

SWITCH 

THREE 
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1 

1 

m 

PUSH 

0 

0 

4 

1 

1 

AND 
c  OR 

THREE 

TIHE 

1 

PUSHED 

f* 

u 

3 

1 

1 

IN 

1 

PUT 

1 

u 

4 

1 

■ 

FOUR 

IT 

ONE 

YOU 

■ 

■ 

QUICK 

0 

0 

1 

1 

T  *10 

1 

1 

QUIT 

1 

3 

0 

■ 

■ 

PACE 

0 

0 

1 

1 

HERE 

1 

PAID 

0 

Q 

1 

1 

AT 

1 

RAIN 

3 

n 

w 

0 

1 

■ 

RAISE 

0 

1 

3 

■ 

1 
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RAMP 


0 


1 


0 


RAN  0  0  1 

AND 


.RANG?  0  1  0 


RAP  00  1 

'OR 


RATE  01  A 

DO 

EIGHT 

JUST 

THREE 


RAY  0  1  6 

AT 

CHECK 

IS 

READ 
THREE 
W  E 


REACH  0  0  1 

'A  C 


READ  1  1  0 

3  AC  K 
LOUD 
HE 

THAT 
T  HO 
YOU 


132 


SEAL 


1 


0 


2 


CLEAR 

GOOD 


REAR 


RED 

CAT 

FOR 

HOT 

LIGHT 

ONE 

RED 

THREE 

TWO 

WHITE 


?I°  00  0 


0  1  o 

1  1  9 


RIGHT 


42 


AND 

AT 

3  ACK 
3ASE 
EE 

CALL 

CAN 

CLEAR 

CROSS 

EIGHT 

c  I  L  E 

p  I V  E 

c  OR 

c  OU  R 

HALF 

HAND 

HERE 

I 

IN 

IS 

LEFT 

LOOK 

LOW 

NEXT 


133 


R  JN 


0 


1 


2 

WAY 

you 


rush  oo  i 

STAND 

SAC  00  1 

ONE 

SAFE  0  1  0 


SAID 


0  o 


u 

THAT 
fl  E 

WE'LL 

YOU 


SAME  1  1  5 

*ODE 

ONE 

TWO 

WINDS 

WITH 


SAVE  1  0 


2 

THAT 

TIME 


SAW 


0  0 


1 


IT 
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SAY 


1 


1 


21 


-LIGHT 

I 

IT 

LOW 

HADE 

MODE 

NEXT 

ONE 

SAY 

STATE 

THAT 

TYPE 

WAY 

WE 

WE'RE 

WE'VE 

WEST 

WHEN 

WHY 

WINOS 

YOU 


SEA  00  1 

VIEW 


SEAL  3  0  1 

AND 


SEATS  3  1  1 

FOR 


1  1 

3  AC  K 
HERE 
IF 

IT'S 

ME 

NO 

WHERE 

YOU 


136 


ui 


SEEM 


SEEMED 


SEEMS 


SEEN 


SEND 


S^ACK 


SHA3E 


SHE'S 


LIKE 


LOCK 


HERE 

IT 


IT 

these 

WOULD 

YOU 


STILL 


TAKE 
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1 

1 

SHOT 

0 

1 

1 

I 

IN 

1 

SHOULD 

1 

0 

6 

1 

1 

o  = 

CRASH 

HAVE 

KEE® 

READ 

SEE 

1 

SHOW 

0 

1 

5 

1 

1 

CALL 

IT 

HE 

THREE 

rou 

1 

SIDE 

0 

1 

3 

1 

1 

AND 

FOR 

THERE 

1 

SIGHT 

0 

1 

6 

1 

1 

1 

AND 

AT 

3UT 

FOR 

GEAR 

TEN 

1 

SITE 

c 

1 

2 

1 

DOES 

PLEASE 

1 

SIX 

1 

1 

30 

1 
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AND 

ARE 

AT 

SACK 
3  ALL 
BASE 
3  = 

CALL 

CAN 

CHANGE 

CHECK 

CLEAR 

DASH 

DO 

DOES 

EIGHT 

FIVE 

FLIGHT 

FLY 

FOR 

FOUR 

GATE 

GO 

GOOD 

GRAS 

HAS 

HE'S 

I 

I'D 

I  'LL 

I'H 

Ic 

IN 

IS 

KNOTS 

LAST 

LATE 

L  E  CT 

LIKE 

LOCK 

LOUO 

L04 

HAKE 

HILE 

HILES 

NINE 

NOT 

ONE 

OR 

0  ACE 

RED 

R  IGHT 

SAY 

SIX 

SREEO 


139 


SQUAWK. 

STAND 

STATE 

STAY 

STILL 

STRIKE 

SWEEP 

SWEET 

SWITCH 

TAKE 

TEN 

TEST 

THEN 

THREE 

TIME 

TURN 

TWO 

WE'RE 

WELL 

WILL 

WIND 

WITH 

WOULD 

YOU 

YOU'LL 


SIZE  00  0 


SKI?  0  1  0 


SKY  10  1 

CLEAR 


SLIDE  01  2 

STEER 

TWO 


SLIGHT  0  0  5 

ICE 
LOT 
R  IGHT 
STEER 
TAIL 
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SLING  0 


SLOW  1  1 


SO  11  11 

FAR 

FORTH 

I 

I'M 

LONG 

MUCH 

THAT 

WE 

WE'LL 

YOU 

YOU'LL 


1  1  0 


0  1  10 

AIR 

FUEL 

GAS 

HAZE 

I 

KIND 
*0 R  E 
NEAR 
PARTS 
SLIGHT 


SOON  0  1  1 

STAY 


SOUND  0  0  1 

WE'LL 


SOFT 


SOME 
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SOUNDS 


0 


1 


1 


GOOD 


SOURCE  0  0  1 

AND 


SOUTH  0  1  4 

AT 

BOUND 

FAST 

HEP; 


SPACE  00  0 


SPARE  0  3  1 

ONE 


S 3  E  E  0  1  1  10 

AND 

AT 

c  I  V  E 

=  DR 

cOUR 

DNE 

SIX 

TEN 

THREE 

TWO 


SPLIT  0  0  1 

DO 


SPOT  GO  4 


142 


P  IVE 
FOUR 
THREE 
TURN 


SPREAD  0  1  ' 

SLIGHT 


SQUAWK  1  1  5 

AND 

FOUR 

ONE 

SIX 

STAND 


STAGE  0  0  1 

RIGHT 


STAND  1  0  4 

3  Y 

CLEAR 

IS 

IT 


STAR  1  0  1 

ONE 


START  1  1  3 

AT 

SIND 

LIVE 


STATE 


20 
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AND 

BASE 

CASE 

EIGHT 

FIVE 

FOR 

FOUR 

FUEL 

IS 

H  OD  E 
NINE 
ONE 

PLEASE 
RIGHT 
SIX 
TEN 
THREE 
T  HO 
W  E  '  9  E 
WHEN 


STAY  1  0  12 

AT 

CLEAN 

CLEAR 

DID 

=  0R 

HERE 

IN 

LEFT 

SIX 

TEN 

WEST 

WITH 


STEER  1  1 


STERN  0  1 
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